Estimating the loss of economic predictability from aggregating firm-level production networks

Abstract To estimate the reaction of economies to political interventions or external disturbances, input–output (IO) tables—constructed by aggregating data into industrial sectors—are extensively used. However, economic growth, robustness, and resilience crucially depend on the detailed structure of nonaggregated firm-level production networks (FPNs). Due to nonavailability of data, little is known about how much aggregated sector-based and detailed firm-level-based model predictions differ. Using a nearly complete nationwide FPN, containing 243,399 Hungarian firms with 1,104,141 supplier–buyer relations, we self-consistently compare production losses on the aggregated industry-level production network (IPN) and the granular FPN. For this, we model the propagation of shocks of the same size on both, the IPN and FPN, where the latter captures relevant heterogeneities within industries. In a COVID-19 inspired scenario, we model the shock based on detailed firm-level data during the early pandemic. We find that using IPNs instead of FPNs leads to an underestimation of economic losses of up to 37%, demonstrating a natural limitation of industry-level IO models in predicting economic outcomes. We ascribe the large discrepancy to the significant heterogeneity of firms within industries: we find that firms within one sector only sell 23.5% to and buy 19.3% from the same industries on average, emphasizing the strong limitations of industrial sectors for representing the firms they include. Similar error levels are expected when estimating economic growth, CO2 emissions, and the impact of policy interventions with industry-level IO models. Granular data are key for reasonable predictions of dynamical economic systems.

features of PNs has yielded fascinating insights.PNs determine and constrain the paths of future economic growth of regions and countries (Hidalgo et al., 2007;Neffke et al., 2011).The position of industries within the national and international PN is predictive of price trends, changes in productivity, and future economic growth (McNerney et al., 2022).The ways economic shocks affect the agents of an economy depend on the PN.Firms that fail might be essential suppliers for other firms, which have to stop their production as a consequence (Ivanov et al., 2014;Yan et al., 2015).Consequently, production disruptions can cascade, similarly to financial contagion (Battiston et al., 2012;Glasserman and Young, 2016;Diem et al., 2020;Thurner, 2022).In this context it is important to mention that PNs can amplify micro-level sector shocks, to cause fluctuations of macro-economic relevance (Acemoglu et al., 2012;Carvalho and Tahbaz-Salehi, 2019;Moran and Bouchaud, 2019).The COVID-19 pandemic showed that models utilizing PNs can produce high quality forecasts of the economic effects of lock-downs (Pichler et al., 2022).Tightly connected to shock propagation in PNs is the topic of the resilience of countries and industries with respect to economic shocks (Henriet et al., 2012;Contreras and Fagiolo, 2014;Klimek et al., 2019;Han and Goetz, 2019;Martin and Sunley, 2015).
Figure 1: Schematic demonstration of the origin of errors in production loss estimates when aggregating firm-level production networks (FPN) to the industry-level (IPN).a) Shock propagation on the IPN in response to a 25% initial disruption of industry 2 (blue X) resulting in a 25% production loss (blue bar marks 25% reduction); scenario 1.The sector disruption can originate from various combinations of shocks on the level of the firms f 2 3 , f 2 4 , f 2 5 .The initial shock spreads downstream to sector 3 and leads to a 25% production loss, and further to sectors 1 (12.5% loss) and sector 5 (16.7% loss).b) Shock propagation on the FPN in response to a 100% disruption of firm 3 (red X, red bar); scenario 2. The disruption propagates downstream (red edge) to firm 6, (50% production loss), and further to firms 1 and 2 (25% loss).Other nodes are not affected (0% loss, empty bars).c) Shock propagation on the FPN in response to a different 100% disruption of firm 5 (red X, red bar), resulting in a 0% production level; scenario 3. The disruption propagates downstream (red edge) to firm 7, (50% loss) and to firms 10 (50% loss) and 11 (25% loss).Other nodes are not affected.d) Comparison of industry-specific production losses, L k , (y-axis) for industries 1,2,3,4, and 5 (x-axis), in response to the aggregated 25% disruption of sector 2 (blue '+') and the two 100% firm-level shocks, of firms 3 and 5 (red squares and circles).Note that both firm-level shocks scenarios lead to the same aggregated 25% shock to industry 2. The production losses of industries 2, 3, and 4 are 0.25, 0.25, and 0, respectively, for all three cascades the symbols '+', circle, and square overlap.However, the output losses of sectors 1 and 5 are remarkably different for the three different shocks (symbols do not overlap).FPN-losses are seen to vary from 0 to 0.25 for sector 1, and from 0 to 0.33 for sector 5, whereas the IPN-losses are the same for both firm-level failure scenarios: 0.125 for sector 1, and 0.167 for sector 5. Remarkably, the IPN-loss deviates from the FPN-loss estimates by about 100%.
PNs are directly linked with the environment and the climate crisis (Willner et al., 2018); they determine the CO2 emission levels of industries and countries (Wiedmann, 2009;Davis and Caldeira, 2010;Wiedmann et al., 2015), and in the other direction, natural disasters may lead to direct and in-direct economic damages that need to be quantified (Hallegatte, 2008;Otto et al., 2017;Colon et al., 2021).Finally, PNs are an integral part of national accounting of almost every economy (Horowitz et al., 2006;Eurostat, 2008;Mahajan et al., 2018), and serve as essential inputs for growth forecasts, employment projections, and estimates for policy interventions.
However, the PNs behind these insights are generally accessible only on an aggregated level in the form of industry-level input-output tables (IOTs) that record how the entire output of one industry enters as a production input into other industries.For almost a century, IOTs have been used to represent countries' PNs (Leontief, 1936;Miller and Blair, 2009).They are widely available and highly standardised (Eurostat, 2008;Mahajan et al., 2018), so that they can be globally connected (Dietzenbacher et al., 2013;Yamano and Ahmad, 2006), thus, enabling the study of global PNs (Timmer et al., 2015;Otto et al., 2017;Klimek et al., 2019).Typically, the dimensionality of IOTs ranges from 56 industries, e.g., in the world input output database 2016 release (Dietzenbacher et al., 2013;Timmer et al., 2015)), to 405 sectors, as in the US-American economy (Bureau of Economic Analysis, benchmark input-output statistics) (Horowitz et al., 2006).Industry-level IOTs are a cornerstone of economic research and modelling.However, industrylevel production networks (IPNs), such as IOTs, are highly aggregated representations of the economy and can not capture the details of the supply-chain relations between firms.The aim of this paper is to demonstrate that these details (manifesting themselves in significant inhomogeneities) are often essential, and their omission can be a source of considerable errors in economic predictions.
Studying firm-level production networks (FPNs) has been al-most impossible until recently, when large-scale FPNs that include (almost) all firms and (almost) all their supply links have become available for countries such as Japan (Fujiwara and Aoyama, 2010) (1.1 million firms, 5.5 million links), Belgium (Dhyne et al., 2015) (0.8 million firms, 17.3 million links), or Hungary (Borsos and Stancsics, 2020) (0.25 million firms, 1.2 million links); for a review see (Bacilieri et al., 2022).Subsequently, new methods have been developed to reconstruct FPNs (Brintrup et al., 2020;Wichmann et al., 2020;Reisch et al., 2022;Ialongo et al., 2022;Kosasih and Brintrup, 2022;Mungo et al., 2023;Mungo and Moran, 2023).Based on this firm-firm supply network data, novel insights are gained on the effects of shock propagation after natural disasters (Inoue and Todo, 2019;Carvalho et al., 2020), on interactions of the financial system with the FPN (Demir et al., 2022;Huremovic et al., 2020;Borsos and Mero, 2020), and quantifying systemic risk contributions of individual firms in an economy have become possible (Diem et al., 2022).Further, the importance of indirect exposures of firms to imports and exports through the FPN was shown in (Dhyne et al., 2021), the origins of firm-size heterogeneity identified (Bernard et al., 2022), and the question of how price changes (inflation) propagate through the FPN was understood (Duprez and Magerman, 2018).
Aggregating FPNs containing millions of firms to IPNs consisting of a few dozens of industries leads to a massive loss of information on production processes and to possibly substantial biases, as this was the case even when aggregating (the already aggregated) IOTs (Kymn, 1990;Su et al., 2010;Lenzen, 2011).Before we illustrate two severe problems that emerge when aggregating firms and their supply relations into IPNs, we specify the necessary notation.
The IPN consisting of m industries is represented by the weighted directed adjacency matrix, Z, where, a link, Z kl , denotes the sales of goods or services (price times quantity) from industry k to industry l for a given time period.Figure 1a shows an example, Z, with m = 5 industry sectors, where, e.g., industry 3 buys inputs needed for its production process from industry 2 and sells its output to sectors 1 and 5. Colors represent the different industries and link weights indicate sales volume.Figure 1b shows the corresponding FPN, W, with n = 11 firms.A link, W i j , denotes the sales of firm i to firm j for the same time period.Every firm, i, belongs to one of the m industries, specified by the i th element of the industry classification vector, p, where, p i ∈ {1, 2, ..., m}.In the example, firm i within industry k (p i = k) is denoted by f k i , and e.g., firms f 2 3 , f 2 4 , and f 2 5 of sector 2 sell to firms f 3 6 and f 3 7 of sector 3. Due to data constraints we assume that each firm i only produces one product, corresponding to its industry classification, p i , as in (Henriet et al., 2012;Inoue and Todo, 2019;Diem et al., 2022).We construct the IPN, Z, by aggregating all product flows between firms from the respective industries, e.g., Z 23 = W 36 + W 46 + W 47 + W 57 and more generally Z kl = n i=1 n j=1 W i j δ p i ,k δ p j ,l . 1 The total number of sales of firm i to all other firms in the FPN, are measured by 1 Official IOTs are constructed differently and are based on surveys and other data sources (Eurostat, 2008;Miller and Blair, 2009).its out-strength, s out i = n j=1 W i j .It is a proxy for firm i's output (amount produced).The in-strength, s in i = n j=1 W ji , represents all purchases of i from other firms.
Problem 1: Aggregated industries are not representative.Figure 1b demonstrates how aggregation causes the first problem.f 3 6 and f 3 7 of sector 3 have no overlap in their customers' industries; f 3 6 sells only to firms in sector 1, and f 3 7 sells only to firms in sector 5. Aggregation to the industry-level erases this information and industry 3 sells equally to industry 1 and industry 5; see Fig. 1a.This means that the output vector of industry 3 is not representative of the output-vectors of the firms it contains.Similarly, the IPN, Z, is not representative of the FPN, W.
Problem 2: Aggregation mis-estimates economic dynamics.The second problem is that aggregation leads to a misestimation of firm-level economic dynamics.Figure 1 illustrates how the mis-estimation of production losses arises by comparing the same production shock propagating on the industry-level network, Z, an the firm-level, W. We compare three scenarios.Figure 1a shows scenario 1, a 25% initial disruption of industry 2 (blue X), at time t = 1.The production of sector 2 drops by 25% (indicated by the bar to the right filled 25% blue), and the production level, h 2 (t), is h 2 (1) = φ 2 = 0.75.This initial shock is specified by the vector of remaining production levels, φ = (1, 0.75, 1, 1, 1).Then, the shock spreads downstream (blue edge) to sector 3 at t = 2 (25% production loss, h 3 (2) = 0.75), and at t = 3 to sectors 1 (12.5% production loss, h 1 (3) = 0.875), and 5 (16.7% production loss, h 5 (3) = 0.833). 2 The shock propagates, as industries 3, 1, and 5 lack inputs for their production processes.Note that the 25% disruption of industry 2 could originate from various combinations of individual shocks to firms, f 2 3 , f 2 4 and f 2 5 , in industry 2. Figure 1b shows scenario 2, the 100% disruption of firm 3, f 2 3 , (red X, red bar).The production of firm 3 drops to 0%, i.e., a total operational failure (h 3 (1) = ψ 1 3 = 0).The firmlevel shock is specified by the remaining production level vector ψ 1 , where ψ 1 3 = 0 and ψ 1 i = 1, for all i 3. The disruption propagates downstream (red edge) to f 3 6 (50% production loss, h 6 (2) = 0.5), and further to firms 1 and 2 (25% production loss, h 1 (3) = h 2 (3) = 0.75).Aggregating the production losses of firms yields a loss of 25% for industries 1, 2 and 3 and a 0% loss for industries 4 and 5. Figure 1c shows scenario 3, the propagation of a 100% disruption of firm 5, f 2 5 , (red X, red bar).Aggregating the resulting production losses yields a loss of 25% for industries 2 and 3, a 0% loss for industries 1 and 4, and a 33% loss for industry 5. Figure 1d compares for each industry, k, (x-axis) the industry-specific production loss, L k , (y-axis), across the three scenarios, 25% shock to sector 2 (blue '+'), 100% shock to firm 3 (red squares) and firm 5 (red circles).When aggregated both firm-level shocks yield the industry-level shock of 25% disruption of industry 2 and for industry 3 and 4 the production losses form shock propagation are also the same (0.25, and 0, respectively) -the symbols '+', circle, and square overlap.However, the output losses of sectors 1 and 2 are vastly different across the three shocks -'+', circle, square do not overlap.The FPN-based losses vary from 0 to 0.25 for sector 1 and from 0 to 0.33 for sector 5, whereas the aggregation-based IPN losses are the same for both firm-level shocks, 0.125 for sector 1 and 0.167 for sector 5.The IPNbased loss mis-estimates the FPN-based losses by 100%.Other network dynamics such as growth, innovation, or productivity spill overs, -happening to a large extent at the firm-and not the industry-level -are potentially affected in similarly drastic ways.
In this paper we quantify the relevance of these two problems by utilizing a unique data set that allows us to observe almost every firm-level supply chain relation of the entire production network of Hungary, containing 243,399 firms and 1,104,141 links in 2019, see Data and Methods.First, we assess how representative industry-level production networks are of realworld firm-level production networks.We do that by quantifying the intra-sector overlaps of firms' input-and output vectors.Second, we quantify the estimation-errors of economywide and industry-specific production losses that arise when using industry-level production networks to approximate firmlevel shock propagation dynamics.Firm-level labor data with monthly time resolution enables us to realistically estimate the size of the COVID-19 shock for individual firms in the beginning of 2020.Then, we compare the production losses from propagating a realistic COVID-19 shock and 1,000 synthetic shock realizations, either on the firm-or the industry-level production network.We sample the synthetic shocks such that they are of the same size when aggregated to the industry-level, but affect firms within industries differently.This feature allows us to clearly show the effects of intra-sector heterogeneity in firms' input-output vectors for estimating production losses, while controlling for size and industry effects.

Quantifying input and output vector overlaps of firms
Large overlaps (firms within sectors are similar) would suggest that aggregation to the industry-level does not lead to large distortions of network dynamics.Small overlaps (firms within sectors are heterogeneous) would lead to potentially large aggregation effects.First, we aggregate for every firm its firmlevel in-and output vector to the industry-level (NACE2), see SI Section 1.Second, for each pair of firms, i, and, j, within a given NACE2 industry we calculate the input overlap coefficient (IOC) and the output overlap coefficient (OOC) as, where m is the number of NACE2 industries (here 86), and Πin and Πout i• are the normalized input-and output vectors of firm, i, respectively, see SI Section 1. IOC i j specifies the fraction of total inputs, i and j buy from the same industries.It quantifies the common exposure of i and j to supply shocks originating from the same upstream industries and indicates the fraction of a demand shock that is forwarded by i and j to the same upstream industries.OOC i j , specifies the fraction total sales, i and j sell to the same industries.It quantifies the common exposure of i and j to demand shocks originating from the same downstream industries and indicates the fraction of a shock that is forwarded by i and j to the same downstream industries.For more information, see SI Section 2. In Fig. 1b, the relative input vector is Πin 10 = (0, 0, 0.5, 0.5, 0) for firm 10 and Πin 11 = (0, 0, 1, 0, 0) for firm 11, hence, IOC 10,11 = 0.5.The propagation of upstream shocks by 10 and 11 will only overlap by 50% (sector 3), while 50% spread to distinct sectors.
Firms within industries are highly different.We show the distribution of the pairwise similarities IOC i j and OOC i j for all firms in NACE2 industry C26, 'Manufacture of computer, electronic and optical products' in Fig. 2. Figure 2a-d show the IOC i j distributions stratified by their number of suppliers (indegree, k in i ). Figure 2a contains all firms that have 1 to 5 suppliers, Fig. 2b 6 to 15,and Fig. 2d more than 36.The average similarity of firms' input vectors is small across all four groups for which the median (vertical solid line) and mean (dashed line) overlaps are 0, 0.121, 0.199 343, and 0.141, 0.196, 0.239, 0.343, respectively.Clearly, the average similarity of input vectors is increasing for firms with more suppliers.The distribution for firms with one to five suppliers (Fig. 2a) is bi-modal, most pairs of firms have either almost no overlap or almost perfect overlap.For firms with a few suppliers (2b-c) the distributions become unimodal and right skewed, implying that very high similarities appear in the right tail, but are not very frequent.Finally, the distribution of input overlaps for firms with more than 35 suppliers are centered around 0.34 (2d).Figure 2e-h show the distribution of the pairwise output overlap coefficients, OOC i j , grouped according to their number of buyers (out-degree, k out i ).The bin sizes are the same as before.The average similarity of output vectors is visibly smaller than those of input vectors.The median and mean overlaps for the respective out-degree bins are 0, 0.025, 0.119, 0.119 and 0.054, 0.087, 0.169, 0.143, respectively.The distributions are more concentrated towards low overlaps and remain right skewed for all out-degree bins.
Similarity of firms is low and varies across industries.We now show the summary statistics of the pairwise IOC i j and OOC i j distributions for all NACE2 industries in Fig. 3, in particular, the mean, 5%, 25%, 50% (median) 75%, and 95% percentiles.Only firms with more than 35 suppliers and buyers are included.The x-axis shows the 86 NACE2 codes present; the y-axis represents the overlap coefficients, each boxplot corresponds to one NACE2 class.less than 2 firms exist in the respective sector and degree bin. Figure 3a shows that the low input overlaps of industry C26 are not just an outlier.The mean of the mean (median) input overlaps, IOC i j , across NACE2 industries is 0.35 (0.33) and the standard deviation of mean (median) input overlaps is 0.084 (0.102).This indicates that relatively low input overlaps are the norm with few outliers.The highest median IOC i j are found in the 'agricultural industry' (A1-A2), 'water collection, treatment and supply' (E36) and in the 'transport' sectors (H53), whereas the lowest median IOC i j are found in service sectors, such as 'other professional', 'scientific and technical activities' (M74), 'travel agency, and related activities' (N79), 'sports activities and amusement and recreation activities' (R93) and 'activities of membership organisations' (S94).The average standard deviation is 0.156.The standard deviation of standard deviations is small 0.048, and the length of error bars appears to be relatively homogeneous across sectors, suggesting that the variation of pairwise input overlaps, IOC i j , is relatively constant across sectors.Figure 3b shows that output overlaps, OOC i j , are on average lower than the input overlaps, but have a higher variation across industries.The mean of the mean (median) output overlaps, OOC i j , across all NACE2 industries is 0.282 (0.257) and the standard deviation of mean (median) output overlaps is 0.147 (0.161), indicating that relatively low output overlaps are the norm with several outliers.For more details, see SI Section 3.
In SI Section 4 we show that for the degree bins 1-5, 6-15, and 16-35 the mean over mean (median) input overlaps, are 0.132, (0.009) 0.202 (0.148), 0.269 (0.241), respectively; the respective values for output overlaps are slightly lower.As for industry C26, generally input and output vectors of firms within industries become more homogeneous with the number of suppliers and buyers.In SI Section 5 we show the same analysis for NACE4 industries based on NACE4-level input-output vectors and find that the intra-sector variation of input-output vectors is higher than at the NACE2 level.In SI Section 6, we show that our results are robust with respect to the choice of the similarity measure.In SI Section 7 we show that the similarity of input and output vectors of firms over time is substantially higher than intra-industry similarities.Individual firms show significant similarity from one year to the next, as expected, while the observed low level of intra-industry similarities capture fundamental heterogeneities.
Overall, we clearly see that input and output overlaps of firms within industries are surprisingly low, across industries and across degree bins.The high level of heterogeneity of inputoutput vectors of firms within industries shows that for most industries sector-level aggregates are practically not representative for the actual firm-level supply chain inter-linkages and very likely will mis-represent dynamic processes occurring on the firm-level network.The average of the mean (median) input overlaps, across all NACE2 industries is 0.35 (0.33) and the standard deviation of mean (median) input overlaps is 0.084 (0.102).The average standard deviation is 0.156.Relatively low input overlaps are the norm with few outliers such as 'agricultural industry' (A1-A2), 'water collection, treatment and supply' (E36) and 'transport' (H53).b) Intra-industry output overlap coefficients, OOC i j .The average of the mean (median) output overlaps, across all NACE2 industries is 0.282 (0.257) and the standard deviation of mean (median) output overlaps is 0.147 (0.161).Again we see small overlaps.Output overlaps are on average lower than the input overlaps, but there appears to be more variation across industries.If industry-level aggregation were fully representative for the IO-vectors of firms in both panels all distributions would correspond to a single bar at an overlap value of 1.Not a single industry is even close to that value, the highest similarities are found for sectors such as Veterinary activities (M75), Manufacture of beverages (C11), Manufacture of other transport equipment (C30).

Production loss mis-estimations from aggregating networks
We now compare the economy-wide production losses for Hungary caused by a COVID-19 shock propagating once on the firm-level production network (FPN), and once on the industrylevel production network (IPN).Based on firms' actual employment reductions, the shock realistically captures how individual firms were affected by COVID-19 in the beginning of 2020.The shock is represented by the vector, ζ, where, ζ i , is the relative reduction of firm i's labor input from January to May 2020, ζ i = max[0, 1 − e i (may)/e i (jan)], and e i is the number of i's employees in the respective month.The remaining production capacities of firms (after the shock) are given by the vector ψ = 1 − ζ, where, ψ i ∈ [0, 1], is the remaining fraction of firm i's production, e.g., if i reduced its employees by 20%, its remaining capacity is ψ i = 0.8.Aggregating the capacities, ψ i , of all firms i in sector k gives sector k's remaining production capacity, φ k .For details on shock construction and aggregation, see Data and Methods.
Following the COVID-19 shock, we simulate how the adaptation of firms' supply-and demand-levels propagate downstream and upstream along the PN, once on the firm-level and once on the industry-level.We employ the simulation model of (Diem et al., 2022), where each firm (industry) is equipped with a generalized Leontief production function, see Data and Methods for details.The simulation stops when the production levels of firms have reached a new stationary state at (model-internal) time, T .Every firm i (or sector k) has a final production level, h i (T, ψ) ∈ [0, 1] (h k (T, φ) ∈ [0, 1]), that depends explicitly on the details of the shock ψ (φ).It represents the fraction of the original production, s out i , firm i (sector k) maintains after the shock has propagated.We define the FPN-based economy-wide Figure 4: Economy-wide production losses, L, obtained from an empirically calibrated and 1,000 synthetic COVID-19 shocks propagating on the aggregated industry-level production network, IPN, (blue dashed line) and on the firm-level production network, FPN, (red line, histogram).The FPN and IPN correspond to the production network of Hungary in 2019; the firm-level shock, ψ, correspond to firms reducing their production level proportional to their reduction in employees between January and May 2020, and are taken from monthly firmlevel labor data.The NACE2 level shock, φ, is the aggregation of ψ.The 1,000 synthetic shocks, Ψ, are sampled such that (when they are aggregated to the NACE2 level) they all have the same size as φ.The empirically calibrated shock, ψ, yields a FPN-based loss, L firm (ψ), of 11.5% (red line).The synthetic shocks yield a distribution of FPN-based production losses, L firm (Ψ), ranging from 10.5% to 15.3% of national output (histogram).The median is 11.7% (see boxplot).As a reference, the Hungarian GDP declined by 14.2% in Q2 2020.Note that for the IPN all realizations, Ψ, result in the same production loss, L ind. (φ), of 9.6%, by construction.The aggregation to the IPN causes a substantial underestimation of the FPN-based production losses.
production-loss as (3) It is the fraction of the overall revenue in the network (measured in out-strength, s out i , see Data and Methods) that is lost due to the shock and the in-direct effects of its propagation.The IPN-based economy-wide production-loss, L ind. (φ), is defined accordingly, see Eq. [9] in Data and Methods.
Figure 4 compares the production losses for the two simulations, FPN and IPN.The propagation on the FPN leads to a production loss, L firm (ψ), of 11.5% (red solid line), while propagation on the IPN yields a loss, L ind. (φ), of 9.6% (blue dashed line).Aggregated industry-level shock propagation substantially underestimates the production losses caused by firmlevel shock propagation dynamics, for the COVID-19 shock, ψ, by 16.5%.We quantify the size of mis-estimations if the firm-level shock was slightly different.We sample 1,000 distinct, synthetic realizations of the COVID-19 shock that are of the same size when aggregated to the industry-level, but affect firms within industries differently.For every sector, k, we take the empirical distribution, ζ i , of firms i belonging to that sector.Then, we sample for every company, i, of sector k a new value, ζ sample i , from this distribution, replace the old ζ i , and calculate the corresponding remaining production capacity vector, ψ sample = 1 − ζ sample .In this way we generate the set of 1,000 synthetic capacity vectors, Ψ = {ψ 1 , ψ 2 , . . ., ψ 1,000 }; for the full algorithm, see SI Section 8.The resulting distribution of FPN-based economy-wide production-losses L firm (Ψ) is shown as histogram and boxplot in Fig. 4. The losses vary strongly from 10.5% to 15.3% of economy-wide production, i.e. losses can vary by a factor of up to 1.46 for different initial shocks of the same size.The actual Hungarian GDP declined by 14.2% in Q2 2020 (OECD, 2023), showing that the losses obtained by our computations are within perfectly realistic bounds; a gross output estimate is not available for comparison.
Note that the 1,000 synthetic shocks propagating on the IPN always lead to the same economy-wide production-loss (9.6%, blue dashed line) because all firm-level shocks, Ψ, impact the industry-level production capacities by exactly the same amount, φ.On average the IPN based production losses underestimate FPN-based losses by 2.3% of the economy-wide production.In relative terms losses are on average underestimated by 18.7%.For 10% of the shocks the underestimation is even larger than 26.3% and the maximum underestimation is 37.1%.This tail of large losses is clearly visible in the histogram and is caused by shocks affecting systemically relevant firms stronger (Diem et al., 2022).The median and mean of the 1000 losses, L firm (Ψ), are 11.7% and 11.9%, respectively, and lie close to the FPN-based production loss, L firm (ψ) = 11.5%, based on the original COVID-19 shock, ψ (red line).
Mis-estimating industry-specific production losses.We now compare the IPN-and FPN-based production losses for every NACE2 industry separately.We define the FPN-based industry-specific production loss of industry, k, in response to the COVID-19 shock, ψ, as It is the fraction of revenue (measured in out-strength) that firms in sector k lost due to the direct and in-direct effects of the shock.The IPN-based industry-specific production loss, L k ind (ψ), is defined accordingly.In Fig. 5 we show for each NACE2 industry the distribution of FPN-based production-losses, L k firm (Ψ), caused by the 1,000 synthetic shocks as boxplot.The IPN-based productionlosses, L k ind.(φ) are indicated by the blue '+'es, the FPN-based production-losses for the original COVID-19 shock, L k firm (ψ), are given by red 'x'es.It is clearly visible that for many industries losses vary strongly across the identically sized shocks, but also the variation between industries is noteworthy.For all but two industries (M73, N82), the production loss distributions are right skewed, few industries (B06, C15, K65, M75, Q87, and R92) have substantial outliers (grey dots) above 3 times the inter-quartile range.This means that for some particular shock Figure 5: Comparison of industry-specific production losses, L k , obtained from an empirically calibrated and 1,000 synthetic COVID-19 shocks propagating on the aggregated industry-level production network, IPN, (blue '+'es) and on the firm-level production network, FPN, (red 'x'es, boxplots).For most industries the FPN-based production losses, L k firm (Ψ), (boxplots) vary strongly across the 1,000 synthetic shocks even though shocks have the same size when aggregated to the industry-level.Shock propagation on the industry-level (blue '+'es) can not capture this variation.IPN-based production-losses typically under-estimate the FPN-based production losses severely.realizations these sectors can suffer extremely large losses.The minimum and maximum values of production losses for different initial shocks can differ by factors of up to 9.5 (B06), 6.0 (B07), 5.7 (C12), 6.2 (J61), 41 (K65), or 25.9 (Q87).The median (mean) ratios of maximum to minimum loss is 2 (3.2).This variation in production losses across different shocks is inaccessible when using aggregated IPN data; it can not be inferred from the blue '+'es.The large variations emerge as different shocks affect firms at different positions in the supply networks that have different systemic relevance (Diem et al., 2022).IPNbased losses ('+'es), lie frequently below the lowest FPN-based loss, while FPN-based COVID-19 losses ('x'es) lie within boxplots.The industries where IPN-based shock propagation under-estimates output losses most are C26 (-59.5%),C28 (-53.5%),J58 (-51.3%),C25 (-50.3%),J63 (-47.8%), and C20 (-42.1%).Over-estimation of production losses from using IPNbased losses are highest for sectors, K66 (150%), C19 (87.4%),R91 (83.3%),Q88 (80%), S94 (65.4%) and E39 (42%).For other sectors, see SI Section 9. We calculate for each industry the mean absolute deviation and take the average across industries, which yields 30.2%.
Last, we consider the hypothetical case that shocks propagate on the same PN, but assuming that all firms have linear production functions, see SI Section 10.We find that the distribution of economy-wide production losses, L firm (Ψ), ranges from 9.5% to 10.8%.This is substantially less variation than when realistic non-linear production functions are used.As expected, the linear production function assumption makes the economy-wide production losses less dependent on which exact firms within industries are impacted by shocks.However, the variations of industry-specific production-losses, L k firm (Ψ), are still very large for several sectors.This emphasizes immediately that in order to correctly estimate sector-level production losses it is crucial which firms are affected by shocks, even in the best-of-all worlds, where shocks would propagate linearly.

Discussion
Production networks are fundamental for explaining and predicting dynamical economic phenomena.For almost a century, these were only accessible as aggregated industry-level production networks (IPNs), usually represented as input-output tables (IOTs).Only recently, large scale firm-level production networks (FPNs), covering entire economies have become available.Based on a unique firm-level production network data set, containing almost all buyer-supplier links of the Hungarian economy, we demonstrated on the one hand that the aggregation of production networks to the industry-level can not be expected to yield anything close to correct predictions of dynamical processes, such as the propagation of short term shocks through production networks.On the other hand we showed that using firm-level supply networks instead, a much more realistic picture can emerge.
We first showcased that industries are not sufficiently representative of the firms they include, because firms within industries are highly heterogeneous wrt. the industries they buy from and sell to.Specifically, two firms within the same industry spend on average only 23.5% on inputs from the same industry, and sell on average only 19.3% of their revenues to the same industry.Even when two firms belong to the same industry their industry-level input and output vectors will differ substantially.Therefore, using industry-level production network data will likely cause substantial mis-estimations of dynamic processes actually occurring on the firm-level.
We next demonstrated that the aggregation of FPNs causes indeed large mis-estimations for economic shock propagation dynamics and the resulting production losses.The demonstration is based on a COVID-19 shock that is realistically calibrated with firm-level employment data and 1,000 synthetic COVID-19 shocks of the same size.While economy wide production losses, in response to the 1,000 shock scenarios, simulated on the FPN range from 10.5% to 15.3% (mean 11.9%), the corresponding IPN-based production losses are 9.6%.In the worst case scenario the underestimation amounts to 37.1%.For single industries the largest average mis-estimation of production losses range from -59.5% to 150%.
Implications for economic modelling and policy making.The presented results imply a range of immediate consequences for economic modelling, in particular for short-term economic dynamics such as shock propagation, but also more generally for the reliability of industry-level IO-models in the context of testing policy implications.
First, our findings make it crystal clear that the size of losses from shock propagation depends crucially on which exact firms are affected by the initial shock.Crises such as COVID-19, the war in Ukraine, or large natural disasters can affect firms within the same industry sectors and regions very differently and, hence, the exact materialization of the shock can lead to significantly different in-direct economic losses.Aggregated industry-level models, such as IO-models, can by design not account for this, potentially underestimating tail losses that appear when a group of systemically important firms receive shocks at the same time.Modelling impact propagation on firm-level production networks might significantly improve economic assessments of crises of this kind.
Second, our method for creating an ensemble of synthetic shock scenarios that are identical on the industry-level, but affect firms differently can be used to estimate realistic confidence intervals for economic impacts of crises.Experts can define a shock on the industry-level (as done routinely for IO-models) and obtain distributions of the quantity of interest for each firm, specific sectors or the whole production network.This approach could reveal which combination of shocks to individual firms causes particularly dangerous scenarios that would go unnoticed with industry-level models.This is useful for designing scenarios in economic stability stress tests.
Third, the presented framework extends well beyond shock propagation.Other forms of network dynamics that are certainly distorted by industry-level aggregation include economic growth, the estimation of CO2 emissions of economic activity, or the spread of price increases.Detailed future research on these topics, considering the details of firm-level production networks is necessary.These topics happen on larger timescales and will be overlaid with other dynamics that were not covered here.These dynamics are most likely more complicated than the ones of short-term shock propagation and therefore it is reasonable to assume that the effects of aggregation are even stronger in these situations.
Fourth, specifically, for estimating CO2 emissions of industry sectors and countries, aggregating input-output tables causes substantial errors in emission estimates (Su et al., 2010;Su and Ang, 2010;Lenzen, 2011).Our results indicate, firms in the same NACE industries use very different inputs and sell to very different industries and therefore their resulting scope-3 emissions (indirect CO2 emissions along supply chains) can differ substantially.Firm-level data will be crucial for reliable and targeted CO2 emission estimates and for designing green transition enhancing economic policies that can target problematic firms (Stangl et al., 2023).
Fifth, in the past economic models, e.g. for assessing economic effects of natural disasters such, as (Henriet et al., 2012), have worked with the simplifying assumption that firms within an industry are the same wrt.their input and output vectors.Our results suggest that for estimating and predicting effects of natural disasters in the future more reliably, production network models should carefully feature firm-level heterogeneity within industries.
Limitations and future research.There is a list of limitations of the presented material.For self-consistency, the industrylevel production network used here is simply the aggregation of a firm-level production network.IO tables are constructed with extensive survey methodologies and the available tables can differ (Borsos and Stancsics, 2020).However, also IO tables are aggregations of underlying firm-and establishment-level networks and are likely to be affected by the same problems and to a comparable extent.
Secondary NACE categories of firms are not contained in our dataset.Larger firms producing several different types of products (in potentially several establishments) are fully aggregated to their primary NACE category.This could lead to an overestimation of heterogeneity of input and output vectors within industries.Future research should quantify the heterogeneity of input and output vectors of establishments used for creating IO-tables.
A potentially strong limitation is that we do not have information of firms' international import and export links.Consider two firms in one sector, one imports a specific input and the other sources it domestically we would over-estimate the heterogeneity of their input vectors.However, for the Belgium production network it has been shown that only a small fraction of firms have direct import and export linkages (Dhyne et al., 2021).
In practice high quality economic data to calibrate industrylevel economic models is widely available and some have achieved good forecasting performance (Pichler et al., 2022).
To calibrate firm-level models, substantially larger amounts of data are needed.For example, quantifying how a shock (e.g. a natural disaster) affects hundreds of thousands of firms is substantially harder than for a few dozens of sectors.Firms within sectors do react differently, modelling their behavior realistically, involves many assumptions, but up to now data for calibration is scarce.
We demonstrated that for how shocks propagate details do matter.In our simulation model important non-linearities appear in the generalized Leontief production functions (GLPF) of companies.The calibration of firms' GLPFs is currently a rough approximation combining firms' NACE4 industry affiliation with an expert based survey for 56 industry sectors conducted in Pichler et al. (2022).The calibration of the GLPF needs refinement in the future, e.g., with large scale firm-level surveys.
Our results point out relevant open questions.Duprez and Magerman (2018) find large idiosyncrasies in price changes of producers within the same product categories.It would be interesting to see, whether these could be explained by the heterogeneity of firms' input and output vectors.In the direction of IO tables, differences of Leontief multipliers for different aggregation levels of IO-tables with potential implications for predicting economic growth were reported (McNerney et al., 2022).It would be of interest to see how this extends across all scales to the firm-level.Heinrich et al. (2022) show that correlation structures found on the sector level do not hold on the firmlevel.Also for this phenomenon the intra-sector heterogeneity of firms could be part of the explanation.The effects of of heterogeneities should also be checked for establishment level supply networks (Schueller et al., 2022).
General equilibrium models (Acemoglu et al., 2012;Carvalho and Tahbaz-Salehi, 2019;Magerman et al., 2016) were shown to depend on network measures such as Leontief multipliers or the 'influence vector'.These are likely to be distorted from aggregating production networks to the industry-level.The sensitivity of results to aggregation could be investigated under a similar framework as the present one.
It has been shown that both industry (Acemoglu et al., 2012) and firm-level (Borsos and Stancsics, 2020) production network exhibit power-law scaling patterns.It would be fascinating to find out under what conditions they preserved under aggregation and -if not -would that explain the differences in shock propagation and other network dynamics?Another open question is, which network modules are particularly affected by aggregation?And, finally, since our data shows that input and output vectors of firms remain relatively stable from one year to another.This raises the question of how fast can production networks adapt to technological change?And would an aggregate perspective of production networks under-or over-estimate the speed of adaption in the network?Further remaining questions include input combinations.The reported large heterogeneity of inputs and outputs of firms within the same sectors implies that the same output can be produced from different input combinations.If one input is no-longer available this might affect a certain company, while others continue production.In the longer term, if one input is becoming structurally more expensive firms could change the production to mimic competitors that use a different input mix to produce the same good.This raises the question if this large amount of heterogeneity in input and output vectors is actually a source of resilience in the production network, or just an inefficiency in knowledge transfer?
To conclude, in this work we showed the importance of modelling production networks on the firm-level.However, currently data on firm-level production networks exist only in very few countries, and is rarely available to research.This work shows how necessary it is to make these data usable for researchers and policy institutions.Complementing traditional industry-level models with new models that are specifically designed for firm-level data is a great opportunity forward for both reliable policy making and progress of scientific research on resilience and transformability of the current economy.

Data and Methods
Data.The Hungarian FPN, W, is based on the 2019 VAT micro-data of the Hungarian Central Bank (Borsos and Stancsics, 2020;Diem et al., 2022).Supply links between two firms are present if the tax content of the transactions was above 1 million Forint for 2018Q1-Q2 and 100,000 Forint for 2018Q3-2019Q4 (approx.250 euros).The link weight, W i j , represents the monetary value of all transactions between the two firms in the given year.We filter the data for stable supply links and keep a link if at least two supply transactions occurred in two different quarters, i.e. we exclude one-off transactions.The filtering reduces the number of links from approx. 2 millions to 1.1 millions, but the transaction volume drops only by approx.10%.The number of firms drops from 315,259 to 243,339 in 2019 and for 2018 from 296,992 to 185,322.Imports and exports are not contained in the data set.The industry affiliation of firms, p i , correspond to the NACE classifications contained in the Hungarian corporate tax registry.On the NACE2 level 86 different classes are present, on the NACE 4 level 587.In 2019 the NACE affiliation is missing for 62,782 firms; in 2018 for 42,385 firms.We treat them as a residual NACE class.
Constructing firm-and industry-level COVID-19 shocks.The employment data (collected by the Hungarian tax authority available at the central bank) contains the number of employees, e i (τ), firm i employed in the respective month τ.We assume that labor is an essential (Leontief-style) input to a firm's production (Eq.[6]), and that after a shock firms only keep the amount of employees needed to operate at the new reduced production level.Therefore, we treat the empirical reduction of employees as a signal for how strong the firm was affected by the consequences of the pandemic in beginning of 2020.No furlough schemes were in place in Hungary.Note that January is sufficiently distant from COVID-19 affecting Europe and May is the time when the initial shock should be fully incorporated in the employment data; there is a two months leave notice period in Hungary.The Hungarian labor data is available for approx.160,000 firms.For the firms with no data we impute the value by drawing the fraction of employment from firms in the respective NACE4 category where the data is available.We conduct the imputation 1,000 times and receive 1,000 completed vectors.For each of them, we calculate the value of economy-wide lost production, L firm , (see Eq. [3]).We choose the completed shock vector that yields the median loss of production as ψ.The corresponding industry-level COVID-19 shock is calculated by aggregating the vector ψ, to the NACE2 industry-level.As firms within a sector mostly have different ratios of in-and out-strength -i.e., s in i /s out i s in j /s out j -, we aggregate the firm-level production capacities to a vector of downstream-constrained, φ d , and a upstream-constrained remaining production capacity, φ u .For industry k, φ u k , φ d k are calculated as We use the notation φ = (φ u , φ d ).We show the aggregated shock, φ, for each NACE2 class in see SI Section 9. Creating synthetic shocks, ψ 1 , ψ 2 , . . ., ψ 1,000 , -that when aggregated to the industry-level are identical to φ d and φ u -can be achieved by ψ 1 , ψ 2 , . . ., ψ 1,000 fulfilling Eq. [5].This implies that the aggregated firm-level shocks all fulfil 1000 .For details, see SI Section 8.
Shock propagation model.The production process of each firm i is represented by a generalized Leontief production function, defined as Π ik is the amount of input k firm i uses for production, I es i is the set of essential inputs, I ne i is the set of non-essential inputs of firm i; l i and c i are i's labor and capital inputs.The essential and non-essential input types of firms are assigned according to their industry affiliation (NACE4) and an expert based survey for 56 industry sectors conducted by (Pichler et al., 2022).The parameters α ik are technologically determined coefficients, β i is the maximum production level possible without non-essential inputs k ∈ I ne i and α i is chosen to interpolate between the full production level (with all inputs) and β i .All parameters are determined by W, I es i and I ne i .The COVID-19 shock, ψ, propagates through the Hungarian production network in the following way.Initially, at time t = 0 the network, W, is stable and the production amount of each firm i corresponds to its out-strength, x i (0) = s out i , where s out i corresponds to firm i' original revenue from its activity in the FPN, W. We denote firm i's remaining fraction of production, at time t as h i (t) = x i (t)/x i (0), hence at time t = 0 before any shocks occur h i (0) = 1 ∀i.At time t = 1 the initial shock materializes and production levels of each firm i drop to the remaining production capacity, h i (1) = ψ i .Then, we simulate how firms propagate the received shock upstream by reducing their demand to suppliers and downstream by reducing their supply to customers.Missing non-essential inputs cause production reductions in a linear fashion, while a lack of essential inputs affects output in the non-linear Leontief way, i.e. downstream shocks can have strong negative impacts on production, depending on the supplier-buyer industry pair.The loss of a customer leads to a production reduction proportional to the customers' revenueshare, i.e. upstream shocks have only linear impacts.For each firm, i, we update the production output, x d i (t + 1), at t + 1, given the downstream constrained production levels of its suppliers, h d j (t), at time t as The production output, x u i (t + 1), of firm i at t + 1, given the upstream constrained production level of its customers, h u l (t), at time t is computed as The algorithm converges at time T , yielding final production levels, h i (T, ψ), for each firm i.The dependence of the final production level on the initial shock is made explicit by writing h i as a function of ψ.Note that the quantity, , is the amount of lost revenue of firm i due to the initial shock and its propagation.For a complete description of the algorithm, see (Diem et al., 2022).For simulating shocks on the industrylevel network, Z, in Eqs.[7]-[8] we replace W with Z, in Eq.
[7], ψ i with φ d i , and in Eq. [8] ψ i with φ u i .This results in the final production levels, h k (T, φ), for each sector, k, and we set L ind. (φ) = 1 − h k (T, φ).The overall production loss, L k ind.(φ), is calculated analogously as in Eq. [3], based on the out strengths, s out k , of sectors, k, as In this section we show how to calculate the intra-sector heterogeneity (or similarity) of the firms' input-output vectors.We start aggregating every firms' firm-level in-and output vector to the NACE2 industry-level.The ith column of the FPN's adjacency matrix, W, represents the firm-level input vector, W i. , of firm i, while the ith row gives the firm-level output vector, W .i .We compute the corresponding industry-level input vector, Π in i. , and output vector, Π out i. , of firm i, by aggregating all in links (purchases) of i's suppliers from the same industry and all out links (sales) to i's customers in the same industry, as The element Π in ik , specifies the amount of input k firm i is buying from suppliers, j, of industry, k, i.e., all j with p j = k.The element Π out ik specifies the amount firm i is selling to firms, j in industry, k, i.e., all j with p j = k.The expression δ p j k is the Kronecker delta and is equal to one if firm j produces product k and zero otherwise, i.e., We focus on the relative importance of firms' input types (industries) and customer industries, independent of firm size.To do so, we compute the normalized input-, Πin where δ p j ,k = 1 if firm j belongs to industry k and δ p j ,k = 0 otherwise.We quantify the similarity between input and output vectors of two firms with the overlap coefficient (OC) due to its clear economic interpretability.To show that our results do not depend on the specific choice of the similarity measure we also look at the jaccard index (JI).

SI Section 2. Details for calculation and interpretation of the input and output overlap coefficient
In general the overlap coefficient of two vectors x, y of dimension m is defined as We calculate the overlap coefficient of the 1-norm || || 1 normalized input, and output vectors, i.e.Πin i and Πout i .Therefore, in each calculation both vectors sum to one, then the denominator is always equal to one and can be dropped.The overlap coefficient is closely related to the weighted Jaccard Index, which has the same numerator, and m k=1 max x k , y k as the denominator.It is also called the Szymkiewicz-Simpson distance (Jones and Furnas, 1987;Vijaymeena and Kavitha, 2016).
As introduced in the main text, for our application we calculate the input overlap coefficient (IOC) and output overlap coefficient (OOC) of two firms i and j as, 3) The denominator from Eq. S.1 can be omitted since m k=1 Πin ik = 1 and m k=1 Πout ik = 1 for all i.We calculate the distribution of the two measures for each industry k, by computing all pairwise IOC i j and OOC i j for all firms where, p i = p j = k, and i j, in the respective industry, k.The input overlap coefficient, IOC i j , of two firms i and j gives the fraction of their overall inputs they source from the same industries, i.e. the overlap of their industry input shares.The output overlap coefficient, OOC i j , of two firms i and j specifies the fraction of their overall sales they sell to the same industries, i.e. the overlap of their industry sales shares.Note that IOC i j also quantifies i's and j's overlap of exposures to other economic dynamics, like price increases or innovations of supplying industries.Similarly, OOC i j measures the common exposure to, e.g., innovation in the buyer industry that makes the input of firms obsolete.
Since we use industry-level input and output vectors, which neglect firm-level differences within industries, the real level of heterogeneity could be even larger.In our dataset cross border import and export links of firms are not available.This could lead to a potential underestimation of overlaps, but a study for Belgium (Dhyne et al., 2021) shows that firms' import and export links are few in relation to national import and export links.

SI Section 3. Further results on input and output overlaps
Further results for NACE C26 Even though, in all four in-degree (out-degree) groups there are firms with very similar input (output) vectors, the results clearly show that in general firms have surprisingly small overlaps with respect to to their suppliers' industries (inputs) and customers' (output) industries.This implies that if two random firms in in-degree (out-degree) bin >35 receive the same absolute size shock, on average only 34% (14%) of the shock's volume is propagated to firms of the same industry while 66% (86%) of the shock is propagated to firms in other industries.At the same time it means that two firms in this industry have on average 66% (86%) of their upstream (downstream) exposures to different supplier (buyer) industries.The low level of similarity of input and output vectors clearly shows that aggregating these firms into a single industry is not representative of the single firms' input-output vectors and will lead to large biases and mis-estimations of economic dynamics.

Further results on output overlaps across industries
The highest median OOC i j are found in Veterinary activities (M75), Manufacture of beverages (C11), Manufacture of other transport equipment (C30), Forestry and logging (A2), Manufacture of leather and related products (C15), Manufacture of basic pharmaceutical products (C21), Telecommunications (J61), whereas the lowest median OOC i j are found in service sectors such as Public administration and defence; compulsory social security (O84), Travel agency and related activities (N79), or Scientific research and development (M72), but also non-service sectors such as Remediation activities and other waste management services (E39), Other manufacturing (C32), or Manufacture of textiles (C13) are among the lowest output overlap sectors.The average standard deviation is 0.17, the standard deviation of standard deviations is 0.047, and the error bar length appears to be relatively homogeneous across sectors.This indicates that the variation of pairwise output overlaps, OOC i j , within sectors is relatively similar across sectors.For the other degree bins see SI Fig. S6.
The same results are shown for the other three out-degree bins 1-5, 6-15, and 16-35 in SI Section 4 Fig.S3.As for industry C26, the output overlaps are smaller for lower degree bins; the averages over the mean (median) output overlaps, are 0.110, (0.021) 0.157 (0.135), 0.223 (0.215), for the bins 1-5, 6-15, and 16-35, respectively.The averages over the standard deviations of output overlaps, are 0.266, 0.129, 0.109, respectively and therefore the variation of output overlaps within is on average decreasing with the number of out-links.Figure S1b illustrates this relationship more clearly by showing for each in-degree size bin (1-5, 6-15, 16-35, >35) the boxplot of the industries' median OOC values.It is clearly visible that output vectors of firms within industries become more homogeneous with the number of suppliers.SI Section 5 shows that OOC i j are even lower when computed for at NACE 4 level.SI Fig. S4b shows the average of the mean (median) output overlaps, across NACE4 industries is 0.231 (0.207).The standard deviation of mean (median) output overlaps is 0.179 (0.19), i.e. higher than for the NACE2 level.This indicates that the variation of average output vector overlaps is higher at the NACE 4 level.The average standard deviation is 0.126 and the standard deviation of standard deviations is 0.056.Note that the average IOC and OOC levels seem to be more similar on the NACE 4 level than at the NACE 2 level where the average IOC is higher than OOC.For the other degree bins see SI Fig. S7.SI Fig. S9 in SI Section 6 shows qualitatively similar results for the Jaccard Index for the degree bin >35.The pairwise input Jaccard Index (IJI) distributions are slightly shifted towards higher similarity values with a average mean (median), 0.398 (0.394) and slightly less variation with a standard deviation of means of 0.07 (0.067).The pairwise output Jaccard Index (OJI) distributions are also shifted towards slightly higher similarity values with a average over means (medians), of 0.301 (0.291) and slightly less variation with a standard deviation of means of 0.076 (0.077).

SI Section 4. Overlap coefficients across industries for other degree bins
This section shows the results of the pairwise input overlap coefficient, IOC, and output overlap coefficient, OOC, distributions across all NACE2 industries for the three degree bins 1-5, 6-15, and 16-35 that are not shown in Fig. 3.As for Fig. 3 we calculate the summary statistics -mean, 5%, 25%, 50% (median) 75% and 95% percentiles -for the pairwise IOC (SI Fig. S2) and OOC (SI Fig. S3) distributions for all NACE2 industries.Again these statistics are visualized as boxplots.The x-axis shows the 86 NACE2 codes present in the data set; the y-axis denotes the overlap coefficients, each boxplot corresponds to a NACE2 class.The dark thick horizontal bars correspond to the median, (p 5% ), the interquartile range (p 25%p 75% ) is shown as thick dark vertical lines, and the error bars (p 5%p 95% ) are indicated by thin light vertical lines.The thin vertical black lines separate NACE2 classes by their NACE1 affiliation.
The results for the IOC distributions are shown in SI Fig. S2.SI Fig. S2a shows the pairwise IOC i j for firms with in-degree between one and five, 1 ≤ k in i ≤ 5.The mean over the industries' mean (median) IOC is 0.132 (0.009), the standard deviation of mean (median) IOCs is 0.081 (0.062).The mean standard deviation is 0.262.SI Fig. S2b shows the pairwise IOC i j for firms with in-degree between one and five, 6 ≤ k in i ≤ 15.The mean over the industries' mean (median) IOC is 0.202 (0.148), the standard deviation of mean (median) IOCs is 0.081 (0.088).The mean standard deviation is 0.192.SI Fig. S2c shows the pairwise IOC i j for firms with in-degree between one and five, 16 ≤ k in i ≤ 35.The mean over the industries' mean (median) IOC is 0.269 (0.241), the standard deviation of mean (median) IOCs is 0.083 (0.091).The mean standard deviation is 0.168.As for NACE C26 in the main text we see that on average input vector overlaps increase with the number of suppliers.
The results for the OOC distributions are shown in SI Fig. S3.SI Fig. S3a shows the pairwise OOC i j for firms with out-degree between one and five, 1 ≤ k out i ≤ 5.The mean over the industries' mean (median) OOC is 0.110 (0.021), the standard deviation of mean (median) OOCs is 0.094 (0.118).The mean standard deviation is 0.226.SI Fig. S3b shows the pairwise OOC i j for firms with out-degree between one and five, 6 ≤ k out i ≤ 15.The mean over the industries' mean (median) OOC is 0.157 (0.135), the standard deviation of mean (median) OOCs is 0.078 (0.074).The mean standard deviation is 0.129.SI Fig. S3c shows the pairwise OOC i j for firms with out-degree between one and five, 16 ≤ k out i ≤ 35.The mean over the industries' mean (median) OOC is 0.223 (0.215), the standard deviation of mean (median) OOCs is 0.078 (0.078).The mean standard deviation is 0.109.Again average overlaps seem to increase with degree (number of customers).Further, output overlaps are on average slightly lower than input overlaps.
Next we illustrate how average similarity increases with the degree bins.Fig. S1 illustrates this relationship more clearly by showing for each degree size bin (1-5, 6-15, 16-35, >35) the boxplot of the industries' median IOC and OOC values.Fig. S1a shows boxplots of the median input overlap coefficients, IOC, for all NACE2 industries for each in-degree bin, respectively.We see that for the bin with 1 to 5 suppliers almost all medians are zero.Then the distribution of medians is substantially shifted upwards for the bin of 6-15 suppliers and it continues to increase for the other two in-degree bins with 16-35 and more than 35 suppliers, respectively.Fig. S1b shows boxplots of the median output overlap coefficients, OOC, for all NACE2 industries for each out-degree bin, respectively.We see that for the bin with 1 to 5 buyers almost all medians are zero.Then the distribution of medians is slightly shifted upwards for the bin of 6-15 buyers, but there are several outlier industries with higher output overlaps.The median OOC continue to increase for the other two out-degree bins with 16-35 and more than 35 buyers, respectively.It is visible that the upper tails of the median OOC distributions are longer than for the median IOC distributions.Overall median OOCs are lower than median IOCs.
Figure S1: Increase of input-and output-vector similarity with increasing in-degree, k in , and out-degree, k out , bins (1-5, 6-15, 16-35, >35).a) boxplots of the median input overlap coefficients for all NACE2 industries for each in-degree bins, respectively.b) boxplots of the median out overlap coefficients for all NACE2 industries for each out-degree bins, respectively.It is clearly visible that input-and output-vectors of firms within industries become on average more similar (higher median IOC and OOC values) with the number of suppliers and buyers.
Figure S2: Distributions of pairwise input vector overlaps, IOC i j , of firms across NACE 2 industries for three in-degree size bins.NACE2 classes are on the x-axis; overlap coefficients on the y-axis.a) pairwise IOC i j for firms with in-degree between one and five, 1 ≤ k in i ≤ 5.The mean over the industries' mean (median) IOC is 0.132 (0.009), the standard deviation of mean (median) IOCs is 0.081 (0.062).The mean standard deviation is 0.262.b) pairwise IOC i j for firms with in-degree between 6 and 15, 6 ≤ k in i ≤ 15.The mean over the industries' mean (median) IOC is 0.202 (0.148), the standard deviation of mean (median) IOCs is 0.081 (0.088).The mean standard deviation is 0.192.c) pairwise IOC i j for firms with in-degree between 16 and 35, 16 ≤ k in i ≤ 35.The mean over the industries' mean (median) IOC is 0.269 (0.241), the standard deviation of mean (median) IOCs is 0.083 (0.091).The mean standard deviation is 0.168.
Figure S3: Distributions of pairwise output vector overlaps, OOC i j , of firms across NACE 2 industries for three in-degree size bins.NACE2 classes are on the x-axis; overlap coefficients on the y-axis.a) pairwise OOC i j for firms with out-degree between one and five, 1 ≤ k out i ≤ 5.The mean over the industries' mean (median) OOC is 0.110 (0.021), the standard deviation of mean (median) OOCs is 0.094 (0.118).The mean standard deviation is 0.226.b) pairwise OOC i j for firms with out-degree between 6 and 15, 6 ≤ k out i ≤ 15.The mean over the industries' mean (median) OOC is 0.157 (0.135), the standard deviation of mean (median) OOCs is 0.078 (0.074).The mean standard deviation is 0.129.c) pairwise OOC i j for firms with out-degree between 16 and 35, 16 ≤ k out i ≤ 35.The mean over the industries' mean (median) OOC is 0.223 (0.215), the standard deviation of mean (median) OOCs is 0.078 (0.078).The mean standard deviation is 0.109.Figure S4: Pairwise similarity distributions of input-and output-vectors of firms within each NACE4 industry.Similarity is measured with the overlap coefficient for firms with more than 35 suppliers (a) and buyers (b), respectively.The y-axis denotes the overlap coefficients, the x-axis shows the NACE4 code for the respective boxplots.The dark blue horizontal bars correspond to the median, (p 50% ), dark blue vertical lines to the interquartile range (p 25%p 75% ), and thin light blue vertical lines to error bars (p 5%p 95% ).Thin black vertical lines separate NACE1 classes.Empty columns indicate sectors with less than two firms in this degree bin.a) distributions of pairwise intra-industry input overlap coefficients, IOC i j .The average of the mean (median) input overlaps, across NACE2 industries is 0.237 (0.216) and the standard deviation of mean (median) input overlaps is 0.11 (0.12).The average standard deviation is 0.126.This indicates that relatively low input overlaps are the norm, but there are several outliers with higher similarities.b) distributions of pairwise intra-industry output overlap coefficients, OOC i j .The average of the mean (median) output overlaps, across NACE2 industries is 0.231 (0.207) and the standard deviation of mean (median) output overlaps is 0.179 (0.19), indicating that relatively low output overlaps are the norm, but there are relatively many outliers with higher similarities.The average standard deviation is 0.135.Output overlaps are on average only slightly lower than the input overlaps, but there is more variation across industries.If industry-level aggregation were fully representative for the IO-vectors of firms in both panels all distributions would correspond to a single bar at the value 1.

SI Section 5. Overlap coefficients for NACE 4 level input output vectors
In this section we show that the pairwise input overlaps, IOC i j , and output overlaps, OOC i j , are lower for all pairs of firms within NACE 4 industries for the NACE 4 level input and output vectors.Remember, in the previous analysis we have computed the overlaps for all pairs of firms within a NACE2 industry and on the NACE2 level input and output vectors.
In the following figures we show the pairwise overlap coefficient distributions of input-and output-vectors of firms within each NACE4 industry for the respective degree-bins 1-5, 6-15, 6-35, and >35.The y-axis denotes the overlap coefficients, the x-axis shows the NACE4 code for the respective boxplots.The dark horizontal bars correspond to the median, (p 50% ), dark vertical lines to the interquartile range (p 25%p 75% ), and thin light vertical lines to error bars (p 5%p 95% ).Thin black vertical lines separate NACE1 classes.Empty columns indicate sectors with less than two firms in this degree bin.
First, we show the distributions of the input overlap coefficients, IOC i j .SI Fig. S6a shows the distributions of pairwise intraindustry input overlap coefficients, IOC i j , for firms with more than 35 suppliers, k in > 35.The average of the mean (median) input overlaps, across NACE2 industries is 0.237 (0.216) and the standard deviation of mean (median) input overlaps is 0.11 (0.12).The average standard deviation is 0.126.This indicates that relatively low input overlaps are the norm, but there are several outliers with higher similarities.SI Fig. S6a shows the distributions of pairwise input overlap coefficients, IOC i j , for firms with in-degree between one and five, 1 ≤ k in i ≤ 5.The mean over the industries' mean (median) IOC i j is 0.063 (0.005), the standard deviation of mean (median) IOCs is 0.074 (0.057).The mean standard deviation is 0.168.SI Fig. S6b shows the distributions of pairwise input overlap coefficients, IOC i j , for firms with in-degree between 6 and 15, 6 ≤ k in i ≤ 15.The mean over the industries' mean (median) IOC i j is 0.112 (0.063), the standard deviation of mean (median) IOCs is 0.084 (0.083).The mean standard deviation is 0.139.SI Fig. S6c shows the distributions of pairwise input overlap coefficients, IOC i j , for firms with in-degree between 16 and 35, 16 ≤ k in i ≤ 35.The mean over the industries' mean (median) IOC i j is 0.165 (0.140), the standard deviation of mean (median) IOCs is 0.107 (0.112).The mean standard deviation is 0.130.Second, we show the distributions of the output overlap coefficients, OOC i j .SI Fig. S6b shows the distributions of pairwise intraindustry output overlap coefficients, OOC i j , for more than 35 buyers, k out > 35.The average of the mean (median) output overlaps, across NACE2 industries is 0.231 (0.207) and the standard deviation of mean (median) output overlaps is 0.179 (0.19), indicating that relatively low output overlaps are the norm, but there are relatively many outliers with higher similarities.The average standard deviation is 0.135.Output overlaps are on average only slightly lower than the input overlaps, but there is more variation across industries.SI Fig. S7a shows the distributions of pairwise output overlap coefficients, OOC i j , for firms with out-degree between one and five, 1 ≤ k out i ≤ 5.The mean over the industries' mean (median) OOC i j is 0.056 (0.005), the standard deviation of mean (median) OOCs is 0.075 (0.054).The mean standard deviation is 0.148.SI Fig. S7b shows the distributions of pairwise output overlap coefficients, OOC i j , for firms with out-degree between 6 and 15, 6 ≤ k out i ≤ 15.The mean over the industries' mean (median) OOC i j is 0.081 (0.057), the standard deviation of mean (median) OOCs is 0.063 (0.068).The mean standard deviation is 0.087.SI Fig. S7c shows the distributions of pairwise output overlap coefficients, OOC i j , for firms with out-degree between 16 and 35, 16 ≤ k out i ≤ 35.The mean over the industries' mean (median) OOC i j is 0.127 (0.116), the standard deviation of mean (median) OOCs is 0.080 (0.082).The mean standard deviation is 0.078.Note that if industry-level aggregation was fully representative for the IO-vectors of firms in all figures all distributions would correspond to a single bar at the value 1.
Figure S5: Increase of input-and output-vector similarity with increasing in-degree, k in , and out-degree, k out , bins (1-5, 6-15, 16-35, >35).a) boxplots of the median input overlap coefficients for all NACE 4 industries for each in-degree bin, respectively.b) boxplots of the median out overlap coefficients for all NACE 4 industries for each out-degree bin, respectively.It is clearly visible that input-and output-vectors of firms within industries become on average more similar (higher median IOC and OOC values) with the number of suppliers and buyers.
Next we show specifically how the average overlap coefficients increase with the degree of firms.Fig. S5 illustrates this relationship by showing for each degree size bin (1-5, 6-15, 16-35, >35) on the x-axis, the boxplot of the NACE 4 industries' median overlap coefficients on the y-axis.Fig. S5a shows boxplots of the median input overlap coefficients, IOC i j , for all NACE4 industries for each in-degree bin, respectively.We see that for the bin with 1 to 5 suppliers most medians are zero.Then the distribution of medians is shifted upwards for the bin of 6-15 suppliers and it continues to increase for the other two in-dgree bins with 16-35 and more than 35 suppliers, respectively.Note that even for two highest degree bins medians can range from almost zero to above 0.8.Fig. S5b shows boxplots of the median output overlap coefficients, OOC i j , for all NACE2 industries for each out-degree bin, respectively.We see that for the bin with 1 to 5 buyers almost all medians are zero.Then the distribution of medians is slightly shifted upwards for the bin of 6-15 buyers, but there are several outlier industries with higher output overlaps.The median OOC continue to increase for the other two out-dgree bins with 16-35 and more than 35 buyers, respectively.Note that even for two highest degree bins medians can range from zero to around 0.8.It is visible that the tails of the median OOC distributions are longer than for the median IOC distributions.Overall median OOCs appear lower than median IOCs.
Figure S6: Distributions of pairwise input vector overlaps, IOC i j , of firms across NACE 4 industries for three in-degree size bins.NACE 4 classes are on the x-axis; overlap coefficients on the y-axis.a) pairwise IOC i j for firms with in-degree between one and five, 1 ≤ k in i ≤ 5.The mean over the industries' mean (median) IOC is 0.063 (0.005), the standard deviation of mean (median) IOCs is 0.074 (0.057).The mean standard deviation is 0.168.b) pairwise IOC i j for firms with in-degree between 6 and 15, 6 ≤ k in i ≤ 15.The mean over the industries' mean (median) IOC is 0.112 (0.063), the standard deviation of mean (median) IOCs is 0.084 (0.083).The mean standard deviation is 0.139.c) pairwise IOC i j for firms with in-degree between 16 and 35, 16 ≤ k in i ≤ 35.The mean over the industries' mean (median) IOC is 0.165 (0.140), the standard deviation of mean (median) IOCs is 0.107 (0.112).The mean standard deviation is 0.130.
Figure S7: Distributions of pairwise output vector overlaps, OOC i j , of firms across NACE 4 industries for three in-degree size bins.NACE 4 classes are on the x-axis; overlap coefficients on the y-axis.a) pairwise OOC i j for firms with out-degree between one and five, 1 ≤ k out i ≤ 5.The mean over the industries' mean (median) OOC is 0.056 (0.005), the standard deviation of mean (median) OOCs is 0.075 (0.054).The mean standard deviation is 0.148.b) pairwise OOC i j for firms with out-degree between 6 and 15, 6 ≤ k out i ≤ 15.The mean over the industries' mean (median) OOC is 0.081 (0.057), the standard deviation of mean (median) OOCs is 0.063 (0.068).The mean standard deviation is 0.087.c) pairwise OOC i j for firms with out-degree between 16 and 35, 16 ≤ k out i ≤ 35.The mean over the industries' mean (median) OOC is 0.127 (0.116), the standard deviation of mean (median) OOCs is 0.080 (0.082).The mean standard deviation is 0.078.d) show input Jaccard Indices, IJI i j , and e-h) output Jaccard Indices, OJI i j , visualized as histograms, for four in-degree, k in i , (number of suppliers) and out-degree, k out i (number of buyers), bins, respectively.Jaccard Index values are on the x-axis in bins of width 0.05; the y-axis shows the frequency to fall in the respective bin.Vertical solid lines correspond to median and dashed lines to mean overlap coefficients.a) pairwise IJI i j for 351 firms with 1 ≤ k in i ≤ 5.The median and mean input Jaccard Index is 0 and 0.141, respectively; the standard deviation is 0.261.b) pairwise IJI i j for 102 firms with 6 ≤ k in i ≤ 15.The median and mean input Jaccard Index is 0.2 and 0.204, respectively; the standard deviation is 0.109.c) pairwise IJI i j for 49 firms with 16 ≤ k in i ≤ 35.The median and mean input Jaccard Index is 0.231 and 0.237, respectively; the standard deviation is 0.091.d) pairwise IJI i j for 62 firms with 35 < k in i .The median and mean input Jaccard Index is 0.425 and 0.43, respectively; the standard deviation is 0.119.It is clearly visible that the similarity of input vectors is low for all size bins, but increases on average with the number of suppliers.e) pairwise OJI i j for 468 firms with 1 ≤ k out i ≤ 5.The median and mean output Jaccard Index is 0 and 0.054, respectively; the standard deviation is 0.163.f) pairwise OJI i j for 118 firms with 6 ≤ k out i ≤ 15.The median and mean output Jaccard Index is 0.1 and 0.115, respectively; the standard deviation is 0.109.g) pairwise OJI i j for 33 firms with 16 ≤ k out i ≤ 35.The median and mean output Jaccard Index is 0.2 and 0.212, respectively; the standard deviation is 0.127 .h) pairwise OJI i j for 13 firms with 35 < k out i .The median and mean output Jaccard Index is 0.267 and 0.265, respectively; the standard deviation is 0.106.The similarity of output vectors is even lower than for input vectors, and also increases on average with the number of buyers.If industry-level aggregation were fully representative for the IO-vectors of firms in NACE C26 in all panels the distributions would correspond to a single bar at the value 1.

SI Section 6. Jaccard Index confirms low similarities
The Jaccard Index for two binary vectors x, y of dimension m can be defined as For firm i we define the binary input vector, π in i , as π in ik = 1 if Πin ik > 0 and the binary output vector, π out i , as π out ik = 1 if Πout ik > 0. Analogously to the IOC and OOC we define the pairwise input vector Jaccard Index, IJI, and the pairwise output vector Jaccard Index, OJI, of two firms i and j as Results for the Jaccard Index We show that the results from the main text do not depend on the specific similarity measure.We show the results of Fig. 2 and Fig. 3 are qualitatively similar when using IJI and OJI instead of IOC and OOC.
First, we show the pairwise similarity distributions of input and output vectors for firms of NACE class 26, Manufacture of computer, electronic and optical products measured with the Jaccard index.SI Fig. S8a-d show input Jaccard Indices, IJI i j , and SI Fig. S8e-h output Jaccard Indices, OJI i j , visualized as histograms, for four in-degree, k in i , (number of suppliers) and out-degree, k out i (number of buyers), bins, respectively.Jaccard Index values are on the x-axis in bins of width 0.05; the y-axis shows the frequency to fall in the respective bin.Vertical solid lines correspond to median and dashed lines to mean overlap coefficients.SI Fig. S8a-d shows Figure S9: Pairwise similarity distributions of input-and output-vectors of firms within each NACE2 industry.Similarity is measured with the Jaccard Index for firms with more than 35 suppliers (a) and buyers (b), respectively.The y-axis denotes the Jaccard Index, the x-axis shows the NACE2 code for the respective boxplots.The dark blue horizontal bars correspond to the median, (p 50% ), dark blue vertical lines to the interquartile range (p 25%p 75% ), and thin light blue vertical lines to error bars (p 5%p 95% ).Thin black vertical lines separate NACE1 classes.Empty columns indicate sectors with less than two firms in this degree bin.a) distributions of pairwise intra-industry input Jaccard Index, IJI i j .The average of the mean (median) input Jaccard Index, across NACE2 industries is 0.398 (0.394) and the standard deviation of mean (median) input Jaccard Index is 0.07 (0.067).The average standard deviation is 0.031.This indicates that relatively low input similarity is the norm with few outliers.b) distributions of pairwise intra-industry output vector Jaccard Index, OOC i j .The average of the mean (median) output overlaps, across NACE2 industries is 0.301 (0.291) and the standard deviation of mean (median) output Jaccard Index values is 0.076 (0.077), indicating that relatively low output similarities are the norm with few outliers.The average standard deviation is 0.031.Output overlaps are on average lower than the input overlaps, but there is only slightly more variation across industries.If industry-level aggregation were fully representative for the IO-vectors of firms in both panels all distributions would correspond to a single bar at the value 1.
that the median and mean similarities of input vectors measured by the IJI are slightly higher than for the IOC.The medians are 0, 0.2, 0.231 and 0.425 for the IJI and 0, 0.121, 0.199, and 0.343 for the IOC.The differences in means is slightly smaller.Further, the standard deviation is smaller for the IJI values than for the IOC values (0.261, 0.109, 0.091, 0.119, vs. 0.282, 0.192, 0.161, 0.148).For smaller size bins the IJI distribution is also right skewed, but less so and the distribution becomes symmetric faster than for the IOC.As indicated by the lower standard deviations the distributions are narrower.In general the similarities are relatively low and far away from the value of one, which would indicate that industry-level aggregation is representative for firm-level input vectors.Fig. S8e-h shows that the median and mean similarities of output vectors measured by the OJI are slightly higher than for the OOC.The medians are 0, 0.1, 0.2 and 0.267 for the OJI and 0, 0.025, 0.119, and 0.119 for the OOC.The differences in the means are smaller.Further, the standard deviation is smaller for the OJI values than for the OOC values (0.163, 0.109, 0.127, 0.106, vs. 0.190, 0.141, 0.156, 0.123).For smaller size bins the OJI distribution is also right skewed, but less so and the distribution becomes symmetric faster.As indicated by the lower standard deviations the distributions are narrower.In general the similarities are relatively low and far away from the value of one, which would indicate that industry-level aggregation is representative for firm-level input vectors.
The patterns of increasing similarity with degree also holds true for IJI and OJI.SI Fig. S9 shows the IJI and OJI for the degree bins of firms with more than 35 suppliers (k in > 35) and more than 35 customers (k out > 35), respectively.SI Fig. S9a shows that as for NACE class C26 the average similarity is slightly higher for the IJI than for the IOC.For the IJI i j the average of the mean (median) input overlaps, across NACE2 industries is 0.398 (0.394) and the standard deviation of mean (median) input overlaps is 0.07 (0.067).For the IOC i j the average of the mean (median) input overlaps, across NACE2 industries is 0.35 (0.33) and the standard deviation of mean (median) input overlaps is 0.084 (0.102).The average standard deviation for IJI is 0.031, which is substantially lower than the average standard deviation for the IOC of 0.156.This implies that the distributions are on average more concentrated for the jaccard index based input vector similarity.This is not surprising as both measures have a similar numerator, but the binary counting of input vectors in the Jaccard Index probably reduces the range of possible lower range outliers.This is because the binary counting of the JI tends to give overlaps that are small when measured with the OC a higher weight (the JI denominator divides in the best case by the number of joint inputs and in the worst case by the number of different inputs of both firms added up).The same reasoning could explain the slightly higher average similarity values of JI over OC.SI Fig. S9b shows the results for the pairwise output vector similarity based on the Jaccard Index.For the OJI i j the average of the mean (median) output vector Jaccard Index, across NACE2 industries is 0.301 (0.291) and the standard deviation of mean (median) input Jaccard Index is 0.076 (0.077).OOC i j .The average of the mean (median) output overlaps, across NACE2 industries is 0.282 (0.257) and the standard deviation of mean (median) output overlaps is 0.147 (0.161).Again the average Jaccard Index based similarity, OJI, is slightly higher than the average output overlap coefficient OOC.The average standard deviation for OJI is 0.031, which is substantially lower than the average standard deviation for the OOC of 0.17.This implies that the distributions are on average more concentrated for the Jaccard Index based output vector similarity.
Overall we observe a qualitatively similar degree of similarity when using the Jaccard Index instead of the overlap coefficient.-5, 6-15, 16-35, >35) as histograms.The median and mean IOCs over time, IOC t,t−1 , are 0.805 (0.678) , 0.755 (0.712), 0.797 (0.761) and 0.847 (0.814), respectively, indicated by the vertical solid (dashed) lines.The standard deviations for the in-degree bins are 0.345, 0.203, 0.161 and 0.142, respectively, and decreasing with the number of in-links.e-h) illustrate the distributions of, OOC t,t−1 , for the respective out-degree bins.The median and (mean) OOCs over time, OOC t,t−1 , are 0.922 (0.737) , 0.816 (0.778), 0.869 (0.841) and 0.847 (0.814),The standard deviations for the out-degree bins are 0.34, 0.209, 0.163 and 0.128; again decreasing with the in-link number.The similarity of firms input-and output-vectors over time is substantially higher than for the pairwise intra-industry similarities.

SI Section 7. Input and output vectors are similar over time
In this section we show that the low pairwise IOC and OOC values for firms within the same industries are not a generic feature of the micro-level data.The similarity of firms input and output vectors over time is substantially higher than the intra-industry similarities.To show this we calculate for each firm the overlap coefficient of its relative input vector in the year t with its input vector in the previous year t − 1 as Analgously, we compute the output overlap coefficient between two years t and t − 1 as 2) The two measures indicate the fraction of total inputs (outputs) that is spent on (sold to) the same industry in the two year.We calculate the overlap coefficients over time for the years 2019 and 2018.Firms are allocated into the respective in-and out-degree bins based on their number of suppliers or customers in the year 2018.
SI Fig. S10 we show the distribution of input and output overlap coefficients of firms' input-and output-vectors across the years 2019 and 2018 over all NACE2 industries.The overlap coefficients, OC, are on the x-axis and counts for the respective OC-value bin on the y-axis.SI Fig. S10a-d illustrates the distributions of IOC t,t−1 across all NACE 2 industries for the four in-degree bins (1-5, 6-15, 16-35, >35) as histograms.The median and mean IOCs over time, IOC t,t−1 , are 0.805 (0.678) , 0.755 (0.712), 0.797 (0.761) and 0.847 (0.814), respectively, and thus substantially higher than for the intra-industry IOCs.The standard deviations for the in-degree bins are 0.345, 0.203, 0.161 and 0.142, respectively and decreasing with the number of in-links.The distributions are left skewed, i.e. very low overlap coefficients are outliers and for the smallest in-degree bin bi-modal.In all four bins there are firms having almost zero input overlap in the two years.While this number is relatively high for the smallest in-degree bin it The median and mean IRPs over time, IRP t,t−1 , are 1 (0.847) , 1 (0.882), 0.923 (0.902) and 0.952 (0.933), respectively, indicated by the vertical solid (dashed) lines.The means increase with the in-degree.The standard deviations for the in-degree bins are 0.316, 0.173, 0.133 and 0.108, respectively, and decreasing with the number of in-links.With increasing in-degree the distributions become more concentrated on the value 1, i.e. most firms retain almost all NACE2 input types.e-h) illustrate the distributions of, ORP t,t−1 , for the respective out-degree bins.The median and (mean) ORPs over time, ORP t,t−1 , are 1 (0.872) , 1 (0.883), 1 (0.913) and 0.962 (0.933), i.e. means increase with in-degree.The standard deviations for the out-degree bins are 0.295, 0.177, 0.130 and 0.100; again decreasing with the in-link number.With increasing out-degree the distributions become more concentrated on the value 1, i.e. most firms retain almost all NACE2 customer industries.The similarity of firms input-and output-vectors over time is slightly higher than the intra-industry similarities.
decreases strongly for higher in-degree bins.For firms with few suppliers this is most likely due to the change of a single or the primary supplier.For the few cases where firms with many suppliers have almost no overlap the likely explanation is that they went out of business between the two years and did not source inputs anymore in the second year.As the network is growing -due to a reduction of the link reporting threshold in mid-2018 -the overlaps over time shown here might be smaller than in practice.Therefore, we check also the probability of retaining an input type from the year 2018 in the year 2019 and find that these are even higher than the overlap coefficients, for details see SI Fig. S11a-d.
Analogously Fig. S10e-h illustrates the distributions of OOC t,t−1 for the respective out-degree bins.The median and (mean) OOCs over time, OOC t,t−1 , are 0.922 (0.737) , 0.816 (0.778), 0.869 (0.841) and 0.847 (0.814), respectively, and thus substantially higher than for the intra-industry OOCs and slightly higher than the IOCs over time.The standard deviations for the out-degree bins are 0.34, 0.209, 0.163 and 0.128; again decreasing with the number of out-links.The distributions are left skewed and for the smallest out-degree bin bi-modal.In all four bins there are firms having almost zero output overlap in the two years, but substantially less so than for the IOC t,t−1 .The probability of retaining an output type (buyer industry) from the year 2018 in the year 2019 is again higher than the overlap coefficients, for details see SI Fig. S11a-d.
To show that firms overwhelmingly keep existing inputs and buyer industries we calculate for each firm the input retention probability and the output retention probability from the binary input and output vectors of a year t with the previous year t − 1. Recall the binary input vector, π in i , is defined as π in ik = 1 if Πin ik > 0 and the binary output vector, π out i , as We define the input retention probability, IRP t,t−1 , for a firm i between two years t and t − 1 as .
(S.3) IRP t,t−1 is the probability that a random input contained in the input vector of firm i in year t − 1 is still present in the input vector The OC is on the x-axis and the counts for the respective OC-value bin on the y-axis.a-d) illustrate the distributions of, IOC t,t−1 , across all NACE 2 industries for the four in-degree bins (1-5, 6-15, 16-35, >35) as histograms.The median and mean IOCs over time, IOC t,t−1 , are 0.718 (0.642) , 0.743 (0.686), 0.765 (0.730) and 0.818 (0.759), respectively, indicated by the vertical solid (dashed) lines, and increasing with in-degree.The standard deviations for the in-degree bins are 0.34, 0.217, 0.167 and 0.188, respectively, and decreasing with the number of in-links.e-h) illustrate the distributions of, OOC t,t−1 , for the respective out-degree bins.The median and (mean) OOCs over time, OOC t,t−1 , are 0.857 (0.719) , 0.759 (0.727), 0.801 (0.740) and 0.768 (0.768).Only the means are increasing, but not the medians.
The standard deviations for the out-degree bins are 0.321, 0.206, 0.176 and 0.115; again decreasing with the out-link number.The similarity of firms input-and output-vectors over time is substantially higher than the intra-industry similarities.
of firm i at time t.Analogously, we compute the output retention probability, ORP t,t−1 , for a firm i between two years t and t − 1 as .
(S.4) ORP t,t−1 is the probability that a random buyer industry contained in the output vector vector of firm i in year t − 1 is still present in the output vector of firm i at time t.
We calculate IRP and ORP over time for the years 2019 and 2018 for each firm.Firms are allocated into the respective in-and out-degree bins based on their number of suppliers or customers in the year 2018.The results are shown as histograms SI Fig. S11, where the retention probabilities are on the x-axis and counts for the respective RP-value bins on the y-axis.SI Fig. S11a-d illustrate the distributions of, IRP t,t−1 , across all NACE 2 industries for the four in-degree bins (1-5, 6-15, 16-35, >35) as histograms.The median and mean IRPs over time, IRP t,t−1 , are 1 (0.847) , 1 (0.882), 0.923 (0.902) and 0.952 (0.933), respectively, indicated by the vertical solid (dashed) lines.The means increase with the in-degree.The standard deviations for the in-degree bins are 0.316, 0.173, 0.133 and 0.108, respectively, and decreasing with the number of in-links.With increasing in-degree the distributions become more concentrated on the value 1, i.e. most firms retain almost all NACE2 input types.SI Fig. S11e-h illustrate the distributions of, ORP t,t−1 , for the respective out-degree bins.The median and (mean) ORPs over time, ORP t,t−1 , are 1 (0.872) , 1 (0.883), 1 (0.913) and 0.962 (0.933).The means increase with the out-degree.The standard deviations for the out-degree bins are 0.295, 0.177, 0.130 and 0.100; again decreasing with the out-link number.With increasing out-degree the distributions become more concentrated on the value 1, i.e. most firms retain almost all NACE2 customer industries.The similarity of firms input-and output-vectors over time is substantially higher than the intra-industry similarities.

Overlaps over time for industries
In this section we show the distribution of input and output overlap coefficients over time for specific NACE2 industries.For completeness we illustrate the similarity over time for NACE2 industry C26 in SI Fig. S12.The overlap coefficient, OC, is on the x-axis and counts for the respective OC-value bin on the y-axis.SI Fig. S12a-d illustrate the distributions of, IOC t,t−1 , across all NACE 2 industries for the four in-degree bins (1-5, 6-15, 16-35, >35) as histograms.The median and mean IOCs over time, and the standard deviation of mean (median) input overlaps is 0.099 (0.102).The average standard deviation is 0.124.This indicates that high input overlaps are the norm with few outliers.b) distributions of pairwise intra-industry output overlap coefficients, OOC t,t−1 .The average of the mean (median) output overlaps, across NACE2 industries is 0.813 (0.828) and the standard deviation of mean (median) output overlaps is 0.101 (0.103), indicating that relatively low output overlaps are the norm with few outliers.The average standard deviation is 0.108.Output overlaps are on average slightly higher than input overlaps.
IOC t,t−1 , are 0.718 (0.642) , 0.743 (0.686), 0.765 (0.730) and 0.818 (0.759), respectively, indicated by the vertical solid (dashed) lines, and increasing with in-degree The standard deviations for the in-degree bins are 0.34, 0.217, 0.167 and 0.188, respectively, and decreasing with the number of in-links.SI Fig. S12e-h illustrate the distributions of, OOC t,t−1 , for the respective out-degree bins.The median and (mean) OOCs over time, OOC t,t−1 , are 0.857 (0.719) , 0.759 (0.727), 0.801 (0.740) and 0.768 (0.768).Only the means are increasing, but not the medians.The standard deviations for the out-degree bins are 0.321, 0.206, 0.176 and 0.115; again decreasing with the in-link number.The similarity of firms input-and output-vectors over time is substantially higher, than the intra-industry similarities.For NACE C26 neither input or output overlaps are consistently larger across degree bins.
Next we look at the distributions of IOC t,t−1 and OOC t,t−1 across NACE2 industries.For the following figures, the y-axis denotes the overlap coefficients between the two years, the x-axis shows the NACE2 code for the respective boxplots.The dark horizontal bars correspond to the median, (p 50% ), dark vertical lines to the interquartile range (p 25%p 75% ), and thin light vertical lines to error bars (p 5%p 95% ).Thin black vertical lines separate NACE1 classes.Empty columns indicate sectors with less than two firms in this degree bin.
First, we focus on the distributions of input overlaps for the years 2019 and 2018, IOC t,t−1 , in SI Fig. S13a and SI Fig. S14.SI Fig. S13a shows the distributions of firms input overlap coefficients, IOC t,t−1 , for firms with more than 35 suppliers, k in i > 35.The average of the mean (median) input overlaps, across NACE2 industries is 0.784 (807) and the standard deviation of mean (median) input overlaps is 0.099 (0.102).The average standard deviation is 0.124.This indicates that high input overlaps are the norm with few outliers.SI Fig. S14a shows the distributions of input overlap coefficients, IOC t,t−1 , for firms with in-degree between one and five, 1 ≤ k in i ≤ 5.The mean over the industries' mean (median) IOC t,t−1 is 0.660 (0.763), the standard deviation of mean (median) IOCs is 0.079 (0.104).The mean standard deviation is 0.334.SI Fig. S14b shows the distributions of input overlap coefficients, IOC t,t−1 , for firms with in-degree between 6 and fifteen, 6 ≤ k in i ≤ 15.The mean over the industries' mean (median) IOC t,t−1 is 0.692 (0.729), the standard deviation of mean (median) IOCs is 0.097 (0.100).The mean standard deviation is 0.184.SI Fig. S14c shows the distributions of input overlap coefficients, IOC t,t−1 , for firms with in-degree between 16 and 35, 16 ≤ k in i ≤ 35.The mean over the industries' mean (median) IOC t,t−1 is 0.730 (0.094), the standard deviation of mean (median) IOCs is 0.094 (0.096).The mean standard deviation is 0.162.
Second, we focus on the distributions of output overlaps for the years 2019 and 2018, OOC t,t−1 , in SI Fig. S13b and SI Fig. S15.SI Fig. S13b shows the distributions of output overlap coefficients, OOC t,t−1 , for firms with more than 35 customers, k out i > 35.The average of the mean (median) output overlaps, across NACE2 industries is 0.813 (0.828) and the standard deviation of mean (median) output overlaps is 0.101 (0.103), indicating that relatively low output overlaps are the norm with few outliers.The average standard deviation is 0.108.Output overlaps are on average slightly higher, than input overlaps for the degree bin >35.SI Fig. S15a shows the distributions of output overlap coefficients, OOC t,t−1 , for firms with out-degree between one and five, 1 ≤ k out i ≤ 5.The mean over the industries' mean (median) OOC t,t−1 is 0.727 (0.876), the standard deviation of mean (median) OOCs is 0.085 (0.110).The mean standard deviation is 0.331.SI Fig. S15b shows the distributions of output overlap coefficients, OOC t,t−1 , for firms with out-degree between 6 and 15, 6 ≤ k out i ≤ 15.The mean over the industries' mean (median) OOC t,t−1 is 0.714 (0.749), the standard deviation of mean (median) OOCs is 0.084 (0.093).The mean standard deviation is 0.208.SI Fig. S15c shows the distributions of output overlap coefficients, OOC t,t−1 , for firms with out-degree between 16 and 35, 16 ≤ k out i ≤ 35.The mean over the industries' mean (median) OOC t,t−1 is 0.760 (0.780), the standard deviation of mean (median) OOCs is 0.119 (0.120).The mean standard deviation is 0.139.
Overall overlap coefficients of firms input-and output vectors for the years 2018 and 2019 are substantially higher than the pairwise overlap coefficients within industries.overlap coefficients on the y-axis.a) distributions of input overlap coefficients, IOC t,t−1 , for firms with in-degree between one and five, 1 ≤ k in i ≤ 5.The mean over the industries' mean (median) IOC t,t−1 is 0.660 (0.763), the standard deviation of mean (median) IOCs is 0.079 (0.104).The mean standard deviation is 0.334.b) distributions of input overlap coefficients, IOC t,t−1 , for firms with in-degree between 6 and fifteen, 6 ≤ k in i ≤ 15.The mean over the industries' mean (median) IOC t,t−1 is 0.692 (0.729), the standard deviation of mean (median) IOCs is 0.097 The mean standard deviation is 0.184.c) distributions of input overlap coefficients, IOC t,t−1 , for firms with in-degree between 16 and 35, 16 ≤ k in i ≤ 35.The mean over the industries' mean (median) IOC t,t−1 is 0.730 (0.094), the standard deviation of mean (median) IOCs is 0.094 (0.096).The mean standard deviation is 0.162.overlap coefficients on the y-axis.a) distributions of output overlap coefficients, OOC t,t−1 , for firms with out-degree between one and five, 1 ≤ k out i ≤ 5.The mean over the industries' mean (median) OOC t,t−1 is 0.727 (0.876), the standard deviation of mean (median) OOCs is 0.085 (0.110).The mean standard deviation is 0.331.b) distributions of output overlap coefficients, OOC t,t−1 , for firms with out-degree between 6 and 15, 6 ≤ k out i ≤ 15.The mean over the industries' mean (median) OOC t,t−1 is 0.714 (0.749), the standard deviation of mean (median) OOCs is 0.084 (0.093).The mean standard deviation is 0.208.c) distributions of output overlap coefficients, OOC t,t−1 , for firms with out-degree between 16 and 35, 16 ≤ k out i ≤ 35.The mean over the industries' mean (median) OOC t,t−1 is 0.760 (0.780), the standard deviation of mean (median) OOCs is 0.119 (0.120).The mean standard deviation is 0.139.

SI Section 8. Constructing synthetic firm-level shocks with same sector level impacts
Recall that ζ is based on the actual employment reductions in the course of the early phase of the COVID-19 pandemic.ζ i is the reduction of the labor input for firm i between January and May 2020, ζ i = max[1 − e i (may)/e i ( jan), 0], where e i is the number of employees in the respective month.In this section we describe the algorithm for constructing new synthetic firm-level shock vectors, ζ 1 , ζ 2 , . . ., ζ 1,000 , that differ in how firms within industries are affected, but are of the exactly same size when aggregated to the industry-level.From these shocks we derive the remaining production level vectors, Ψ = {ψ 1 , ψ 2 , . . ., ψ 1,000 } that enter the shock propagation algorithm described in the Data and Methods section.We show how to construct a new shock vector, ζ l .This problem can be solved sequentially for all industries k ∈ {1, 2 . . ., m}, in our case the 593 NACE 4 classes contained in the data and the additional industry we introduce for all firms without NACE information.
We start by specifying additional industry-level notation.For a given industry k ∈ {1, 2 . . ., m} we denote the number of firms within the industry as n k .The indices of the n k firms in sector k are denoted as I k = {i |p i = k}.The in-and out-strength of industry k is defined as s in,k = n i=1 s in i δ p i ,k and s out,k = n i=1 s out i δ p i ,k .The initial shock to sector k, can be defined either through aggregating the shock vector ζ, or through aggregating the remaining production levels vector ψ, since ψ = 1 − ζ.In Eq. [5] in the Data and Methods Section, we derived the shock to sector k by aggregating the vector ψ, according to firms in-and out-strengths as φ u k indicates the fraction of goods sector k is still buying from its supplier industries, i.e. the fraction of k's in-strength, s in,k , remaining after the shock.ξ u k = 1 − φ u k is the size of the corresponding demand shock that propagates upstream.φ d k indicates the fraction of goods sector k is still selling to its buyer industries, i.e. the fraction of k's out-strength, s out,k ,, remaining after the shock.
k is the size of the corresponding supply shock that propagates downstream.For the shock propagation algorithm it is more convenient to work with ψ, φ u and φ d , but for sampling new synthetic shocks we continue to work with ζ, ξ u and ξ d .
Our goal is to find for each firm i in industry k (i.e., where p i = k) an initial shock, ζ l i , such that after aggregation, the sector level shock has the same size as the empirically defined original shock, ζ.
. The two right hand side terms ξ u k s in,k and ξ d k s out,k are the target shock sizes the new firm-level shock, ζ l , needs to fulfil for industry k.We know that at least one solution exists always exists, the original initial shock ζ.Here, our sampled shocks fulfil Eq. [S.1] at the NACE4 level, as the NACE2 level constraint would lead to even higher variability in the resulting production losses.Not that if no firm-level shock is available and we want firm-level shocks that correspond to a specific industry-level shock, then the targeted shock size can be specified directly with the sector level shock vectors, (ξ u k , ξ d k ).In this way we can construct many random firm-level shocks and receive a distribution of production losses for the given industry-level shock.We solve sampling problem [S.2] in two steps.
Sampling new shocks.The first step is shown in detail in Algorithm 1. First, we define the auxiliary index set Ĩk = {i |p i = k} that contains all indices of firms belonging to sector k.The firm index, i, refers to the row and column index firm i belongs to in the adjacency matrix W, and the position of i in the industry affiliation vector p.Note we use the terms "firm i" and "index, i" interchangeably.We initialise the algorithm by setting the shock size for each firm i in industry k to zero, i.e. ζ l i ← 0 for all i ∈ I k .Then, we add shocks to the values, ζ l i (∀i | p i = k), until the new shock is larger than the original shock target, i.e., 3) The shocks are added in the following way.First we draw a firm index i from the index set Ĩk , and delete the index i from the index set, Ĩk .Then we draw a shock value, η ∈ [0, 1], from a specified distribution that takes values between zero and 1.Here we draw the shock values from the empirical distribution of employment shocks of sector k, i.e., η ∼ {ζ j | j ∈ I k }.Note that the empirical shock distribution, {ζ j | j ∈ I k }, contains only values that lie between zero and 1.Note that we could also sample more general shocks, by drawing values from, e.g., the Beta distribution that is flexible enough to sample very concentrated or very evenly distributed shocks.Note that here we do not draw negative shocks that would be interpreted as production gains, or increases in production capacity, even though this would be possible for more general shocks.We add the additional shock value, η, to the previous shock level of firm i, i.e., The min [1, .]function is necessary, because a firm i can be drawn a second time for receiving a shock, but shocks can not be larger than one -a firm can not lose more than 100% of its production.Since, this procedure is continued until the necessary aggregate shock level (ξ u k s in,k , ξ d k s in,k ) is reached, it can happen that each firm has been drawn already and the index set Ĩk is empty, i.e., Ĩk = ∅.In this case we fill up the index set again with all firms in industry k, i.e. we set Ĩk ← {i |p i = k}.This can happen when in the original shock, ζ, relatively large firms received relatively large shocks, and these large firms only received small shocks in the first round of Monte Carlo draws.
Algorithm 1 Drawing shocks for firms in industry k Create the set, containing all firm indices of firms belonging to sector k.Rescaling of shocks.In a second step we find weights to rescale the shocks, ζ l i , such that the constraints in Eq. [S.2] hold exactly.The basic idea is to divide the firms in sector k into two groups.The first group contains firms that have a higher ratio of in-strength to out-strength than the empirical shock, i.e., We assign all firms i of sector k that fulfil this condition to the set I in,k .The second group contains firms that have a higher ratio of out-strength to in-strength than the target shock, i.e., We assign all firms i of sector k that fulfil this condition to the set I out,k .Edge cases having exactly the same ratio can be added to the group with fewer firms.Then, we define a rescaling factor for the in-strength 'heavy' firms, v in , that rescales all ζ l i where i ∈ I in,k , and a rescaling factor for the out-strength 'heavy' firms, v out , that rescales all ζ l i where i ∈ I out,k .If we increase v in while leaving v out untouched, the shock scenario, ζ l , will result in a higher loss of in-strength relative to the loss out-strength of sector k and therefore a larger upstream shock relative to the size of the downstream shock.If we increase v out while leaving v in untouched, the shock scenario, ζ l , will result in a higher loss of out-strength relative to the loss of in-strength to of sector k and therefore a larger downstream shock relative to the size of the upstream shock.Now we only need to determine the weights v in and v out , such that the first two constraints in problem statement [S.2] exactly hold.
In principle the weights v in and v out can be found by solving the following linear system of equations, (S.5) The linear system [S.4-S.5]can be written in standard matrix form as The system is not always directly solvable for a given vector, ζ l , that results from Algorithm 1.
In Algorithm 2 we show how to find the rescaling weights, v = (v in , v out ) , for a given ζ l .For each firm in industry k, i.e., the set I k = {i |p i = k}, we initialize the algorithm with the elements from the shock vector, ζ l i , that results from Algorithm 1.Then, we calculate the size of the violation of the first two constraints in problem statement [S.2], i.e. the distance to the targeted upstream shock size, ,k and the distance to the targeted downstream shock size, , where -.-denotes the absolute value.We define the "available for rescaling" indicator vector, d, where d i = 0 indicates that the shock, ζ l i , can be rescaled, and d i = 1 indicates that it can not be rescaled, because, ζ l i , was scaled above 1 in a previous iteration.Initially we set d i ← 0 ∀ i ∈ I k , i.e. all firm shocks can initially be rescaled.
We continue the following calculations until the distance to the targeted upstream and downstream shock becomes smaller than a threshold , i.e. the algorithm stops when o in ≤ and o out ≤ .We set the parameter epsilon to 0.01, such that in absolute monetary terms the difference in shocks becomes smaller than 10 Forint (approx 0.025 Euros).
First, we calculate the remaining target shock size, b = (b in , b out ). b is the respective upstream or downstream shock target, (ξ u k , ξ d k ), reduced by the respective in-strength or out-strength of firms that are not available for rescaling anymore.b in ← ξ u k s in,k − n i=1 d i s in i specifies the size of the targeted in-strength shock that remains after deducting the in-strength of firms that received already a 100% shock, i.e., where ζ l i = 1 and therefore where specifies the size of the targeted out-strength shock that remains after deducting the out-strength of firms that received already a 100% shock, i.e., where ζ l i = 1 and therefore d i = 1.The variables b in and b out need to be calculated in every iteration, because the change in the remaining target shock size, b, affects which firms belong to the set of "in-strength-heavy" firms and the set of "out-strength-heavy" firms.Hence, we update these two sets by setting I in,k to include all firms i where b out and I out,k to include all i where Edge cases can again be added to the group with fewer firms.
Next, we need to update the values of the coefficient matrix A. The values are updated, because firms that have received already a full shock (d i = 1) are not considered anymore for rescaling, i.e. we sum only over firms where d i = 0. We calculate ) is the indicator variable that is one if firm i can be rescaled and zero if firm i can not be rescaled anymore.The system has a solution when the rank of A has the same rank as the matrix (A|b).
We list the four cases when the shocks, ζ l i , lead to a violation of the rank condition in matrix, A. First, if no firm i that has positive instrength and is available for rescaling, (i.e., where d i = 0), receives a shock, then the first row would be zero.Further, if additionally b in > 0 the system has no solution.We can remedy this case by drawing a new shock for a firm that has previously not received a shock and has positive in-strength.Second, if no firm that has positive out-strength and is available for rescaling, (i.e., where d i = 0), receives a shock, then the second row would be zero.Further, if additionally b out k > 0 the system has no solution.We can remedy this case by drawing a new shock for a firm that has previously not received a shock and has positive in-strength.These two cases do not happen with the initially drawn shocks, ζ l i , because of the condition in the while statement of Algorithm 1, but they can occur during the adjustment procedure in Algorithm 2, because the summations depend on the indicator variable d i .Third, if no firm from the group of high in-to out-strength ratio, I in,k , receives a shock, then the first column of A is zero, which usually leads to an unsolvable system.We can remedy this case by drawing a new shock for a firm, i, that has previously not received a shock, ζ l i = 0, and belongs to the set I in,k .Fourth, if no firm from the group of high out-to in-strength ratio, I out,k , receives a shock, then the second column of A is zero, which usually leads to an unsolvable system.We can remedy this case by drawing a new shock for a firm, i, that has previously not received a shock, ζ l i = 0, and belongs to the set I out,k .The last two cases can occur since we do not specifically avoid them in Algorithm 1.If an additional shock was drawn, the matrix A needs to be updated again.
Next, we can solve the linear system of equations Av = b, by computing the generalized inverse, A † , of A and set v ← A † b.Then, we rescale the the elements of the shock vector ζ l , in the following way.For the firms, i, that belong to the "in-strength-heavy" group, i ∈ I in,k , and are still available for rescaling, (where d i = 0), we set ζ l i ← v in ζ l i .For the firms, i, that belong to the "outstrength-heavy" group, i ∈ I out,k , and are still available for rescaling, d i = 0, we set ζ l i ← v out ζ l i .Then, we update the indicator variable by setting d i ← 1 for all i with ζ l i > 0. To ensure that shocks are not larger than one we take the maximum with 1, i.e. we set . Finally, we update the distance to the target shock, o in and o out .We have implemented algorithm 2 sufficiently fast to sample the 1,000 shocks for each of the approx.245,000 firms within a few hours.Note that the common rescaling of many firm shocks at the same time with the same factors v might not lead to a full traversing of the space of all possible firm-level shocks that are consistent with our sampling problem [S.2].This means that in practice for one specific industry-level shock the heterogeneity of production losses computed on the firm-level could be even larger.We have checked that the resulting shocks are uncorrelated on the firm-level and perfectly correlated (identical) when aggregated to the industry-level.
Algorithm 2 Rescaling weights for shocks of firms in industry k Calculate the remaining absolute shock that is left after deducting strength of fully scaled up firms i where d i = 1.

7:
Set I in,k to include all i where Update "in-strength-heavy" group.

8:
Set I out,k to include all i where Update "out-strength-heavy" group.
Update the distance from the targeted shock.

SI Section 10. Results on linear shock propagation
In this section we show how production losses propagate differently on the firm-level and industry-level production network, when all firms and industries have only linear production functions.As pointed out in Eq. [6], each firm i is equipped with a generalized Leontief production function (GLPF), which is defined as and where I es i is the set of essential inputs, I ne i is the set of non-essential inputs of firm i.The linear production function is a special case of the GLPF where all inputs are in the set of non-essential inputs, I ne i .We simulate the shocks when for all firms i all inputs, k ∈ I ne i belong to, k ∈ {1, 2, . . ., m}.We show the estimation errors for network wide production losses from simulating the shock propagation on the IPN, Z, instead on the FPN, W. Fig. S16 shows the distribution of network wide production losses, L firm (ψ l ), in response to the 1,000 synthetic COVID-19 shock scenarios Ψ (defined in the maintext) as histogram and boxplot; loss bins, L firm (ψ l ), are on the x-axis and frequency of the losses in the respective bins on the y-axis.The variability of losses is economically substantial and ranges from 9.51% to 10.73% -a factor of 1.13.The median and mean losses are 10% each.The variation is substantially smaller than for case with the GLPF shown in Fig. 4 with losses differing by a factor of up to 1.46 across different shocks.Note again that the GDP growth in Hungary for Q2 2020 was -14.2%, indicating a realistic order of magnitude, but a substantial underestimation.Note again that the initial shocks all have the same monetary size and are identical at the industry-level, i.e. the variation of losses is merely due to the fact that different firms within sectors are initially shocked.The distribution is slightly right skewed with a right tail of larger losses.The right tail is substantially smaller than for the GLPF case.The production loss, L firm (ψ) = 9.6%, corresponding to the labor shock, ψ, (red vertical solid line) lies below the median of the loss distribution.
The IPN based production losses, L ind. (φ), are shown as vertical blue dashed line.As in the main text, firm-level shocks are by construction identical when aggregated to the NACE2 level, each of the 1,000 shock scenarios leads to exactly the same production loss of 5.5% when propagating on the NACE2 level IPN, Z. Interestingly, the IPN estimated production losses, L ind. (φ), are substantially smaller than the distribution of FPN estimated production losses L firm (Ψ).Therefore, the aggregated network, Z not only can not capture the variation of production losses on the firm-level network, W, but the overall level of shock propagation is underestimated substantially.To quantify the error of estimating the FPN based production loss, L firm (Ψ), with the corresponding IPN based production loss, L ind. (φ), we calculate the mean absolute error (deviation).We find that the average estimation error is -45.35% (E L ind. (φ) L firm (Ψ) − 1 ).For the Hungarian production network and the initial shocks, industry-level network shock propagation tends to substantially and systematically underestimate losses from firm-level shock propagation also when production functions are linear.Fig. 5 shows the distribution of industry specific production losses, L k firm (Ψ), in response to the 1,000 synthetic COVID-19 shock scenarios, Ψ, as boxplots.Each boxplot corresponds to an industry, k, with the NACE2 code on the x-axis; the y-axis denotes the losses, L k firm (Ψ), of the respective NACE2 codes.The mean overindustry-specific median (mean) losses is 10% (10.3%).The red 'x' symbols represent the production losses, L k firm (ψ), corresponding to the original labor shock, ψ and lie within the boxes.We clearly see that for many industries remaining production levels vary strongly across initial shocks and the level of variation is very different across industries.The production loss distributions are obviously right skewed -indicated by extended upper vertical lines (whiskers) -for all but two industries (H53, N82).Few industries (B05, B06, C15, K65, M75, Q87, and R92) have a substantial amount of outliers (grey dots) that lie outside of 3 times the interquartile range.The minimum and maximum values can differ by factors of up to 9.5 (B06), 7.7 (C12), 5.9 (C30), 5.1 (J61), 41.1 (K65), or 25.8 (Q87).The median (mean) ratios of maximum to minimum loss is 1.27 (1.58).Again, these large deviations do not stem from different sizes of initial shocks, but affecting different firms within industries.Note that for some sectors the factors, representing the relative variation (maximum loss / minimum loss), are even higher for the case of only linear shock propagation.This is due to the fact that the minimum of the losses are smaller for the linear shock propagation, but the maximum losses are not affected by the non-linearities of the GLPF, i.e. ratios are larger.Fig. 5 shows that the IPN basedindustry-specific production losses, L k ind.(φ), (blue '+' symbols) deviate even stronger from the FPN based losses than for network wide losses.The sectors where IPN based shock propagation underestimates output losses the most are C6 (-62.6%),C26 (-60.7%),C29 (-62%), C33 (-61.5%),K64 (-66.8%),K65 (-75.5%), and M69 (-68.6%) with negative average relative deviation in parenthesis.Overestimation of losses are highest for sectors, C12 (95.3%),C21 (70%), E36 (66.3%), and 87 (46%).The average across the mean absolute deviation of industries is 31.1%.

Figure 2 :
Figure 2: Pairwise similarity distributions of input and output vectors for firms of the NACE class 26, 'manufacture of computer, electronic and optical products'.a-d) show input vector overlap coefficients, IOC i j , and e-h) output vector overlap coefficients, OOC i j , for four in-degree, k in i , (number of suppliers) and out-degree, k out i (number of buyers) bins, respectively.Vertical solid lines correspond to median, dashed lines to the average overlap coefficients.a) IOC i j for 351 firms with 1 ≤ k in i ≤ 5 suppliers; b) 102 firms with 6 ≤ k in i ≤ 15suppliers; c) 49 firms with 16 ≤ k in i ≤ 35 suppliers; d) for 62 firms with more than 35 suppliers.It is clearly visible that the similarity of input vectors is low for all numbers of supplier, but increases on average with the number of suppliers.e) OOC i j distribution for 468 firms with 1 ≤ k out i ≤ 5 customers; f) 118 firms with 6 ≤ k out i ≤ 15 customers; g) 33 firms with 16 ≤ k out i ≤ 35 customers; and h) 13 firms with more than 35 customers.The similarity of output vectors is even lower than for input vectors, and also increases on average with the number of buyers.If industry-level aggregation were fully representative for the IO-vectors of firms in NACE C26 in all panels the distributions would correspond to one single bar at an overlap value of 1.

Figure 3 :
Figure 3: Pairwise similarity distributions of input-and output-vectors of firms within all NACE2 industries.The overlap coefficient is computed for firms with more than 35 suppliers (a) and buyers (b), respectively.The dark blue horizontal bars in the boxplots correspond to the median, (p 50% ), dark blue vertical lines to the inter-quartile range (p 25%p 75% ), and thin light blue vertical lines to error bars (p 5%p 95% ).Thin black vertical lines separate the NACE1 classes.Empty columns indicate sectors with less than two firms in this degree bin.a) Intra-industry input overlap coefficients, IOC i j .The average of the mean (median) input overlaps, across all NACE2 industries is 0.35 (0.33) and the standard deviation of mean (median) input overlaps is 0.084 (0.102).The average standard deviation is 0.156.Relatively low input overlaps are the norm with few outliers such as 'agricultural industry' (A1-A2), 'water collection, treatment and supply' (E36) and 'transport' (H53).b) Intra-industry output overlap coefficients, OOC i j .The average of the mean (median) output overlaps, across all NACE2 industries is 0.282 (0.257) and the standard deviation of mean (median) output overlaps is 0.147 (0.161).Again we see small overlaps.Output overlaps are on average lower than the input overlaps, but there appears to be more variation across industries.If industry-level aggregation were fully representative for the IO-vectors of firms in both panels all distributions would correspond to a single bar at an overlap value of 1.Not a single industry is even close to that value, the highest similarities are found for sectors such as Veterinary activities (M75), Manufacture of beverages (C11), Manufacture of other transport equipment (C30).

Figure S8 :
FigureS8: Pairwise similarity distributions of input and output vectors for firms of NACE class 26, Manufacture of computer, electronic and optical products measured with the Jaccard Index.a-d) show input Jaccard Indices, IJI i j , and e-h) output Jaccard Indices, OJI i j , visualized as histograms, for four in-degree, k in i , (number of suppliers) and out-degree, k out i (number of buyers), bins, respectively.Jaccard Index values are on the x-axis in bins of width 0.05; the y-axis shows the frequency to fall in the respective bin.Vertical solid lines correspond to median and dashed lines to mean overlap coefficients.a) pairwise IJI i j for 351 firms with 1 ≤ k in i ≤ 5.The median and mean input Jaccard Index is 0 and 0.141, respectively; the standard deviation is 0.261.b) pairwise IJI i j for 102 firms with 6 ≤ k in i ≤ 15.The median and mean input Jaccard Index is 0.2 and 0.204, respectively; the standard deviation is 0.109.c) pairwise IJI i j for 49 firms with 16 ≤ k in i ≤ 35.The median and mean input Jaccard Index is 0.231 and 0.237, respectively; the standard deviation is 0.091.d) pairwise IJI i j for 62 firms with 35 < k in i .The median and mean input Jaccard Index is 0.425 and 0.43, respectively; the standard deviation is 0.119.It is clearly visible that the similarity of input vectors is low for all size bins, but increases on average with the number of suppliers.e) pairwise OJI i j for 468 firms with 1 ≤ k out

Figure S10 :
FigureS10: Distribution of input and output overlap coefficients of firms' input-and output-vectors across the years 2019 and 2018 over all NACE2 industries.The overlap coefficients, OC, are on the x-axis and counts for the respective OC-value bin on the y-axis.a-d) illustrate the distributions of, IOC t,t−1 , across all NACE 2 industries for the four in-degree bins (1-5, 6-15, 16-35, >35) as histograms.The median and mean IOCs over time, IOC t,t−1 , are 0.805 (0.678) , 0.755 (0.712), 0.797 (0.761) and 0.847 (0.814), respectively, indicated by the vertical solid (dashed) lines.The standard deviations for the in-degree bins are 0.345, 0.203, 0.161 and 0.142, respectively, and decreasing with the number of in-links.e-h) illustrate the distributions of, OOC t,t−1 , for the respective out-degree bins.The median and (mean) OOCs over time, OOC t,t−1 , are 0.922 (0.737) , 0.816 (0.778), 0.869 (0.841) and 0.847 (0.814),The standard deviations for the out-degree bins are 0.34, 0.209, 0.163 and 0.128; again decreasing with the in-link number.The similarity of firms input-and output-vectors over time is substantially higher than for the pairwise intra-industry similarities.

Figure S11 :
FigureS11: Distribution of input and output retention probabilities (IRPs and ORPs) of firms for 2019 and 2018 across all industries.The retention probabilities, RPs, are on the x-axis and counts for the respective RP-value bins on the y-axis.a-d) illustrate the distributions of, IRP t,t−1 , across all NACE 2 industries for the four in-degree bins(1-5, 6-15, 16-35, >35)  as histograms.The median and mean IRPs over time, IRP t,t−1 , are 1 (0.847) , 1 (0.882), 0.923 (0.902) and 0.952 (0.933), respectively, indicated by the vertical solid (dashed) lines.The means increase with the in-degree.The standard deviations for the in-degree bins are 0.316, 0.173, 0.133 and 0.108, respectively, and decreasing with the number of in-links.With increasing in-degree the distributions become more concentrated on the value 1, i.e. most firms retain almost all NACE2 input types.e-h) illustrate the distributions of, ORP t,t−1 , for the respective out-degree bins.The median and (mean) ORPs over time, ORP t,t−1 , are 1 (0.872) , 1 (0.883), 1 (0.913) and 0.962 (0.933), i.e. means increase with in-degree.The standard deviations for the out-degree bins are 0.295, 0.177, 0.130 and 0.100; again decreasing with the in-link number.With increasing out-degree the distributions become more concentrated on the value 1, i.e. most firms retain almost all NACE2 customer industries.The similarity of firms input-and output-vectors over time is slightly higher than the intra-industry similarities.

Figure S12 :
FigureS12: Similarity of firms' input-and output-vectors for 2019 and 2018 for NACE2 class C26, measures with the overlap coefficient (OC).The OC is on the x-axis and the counts for the respective OC-value bin on the y-axis.a-d) illustrate the distributions of, IOC t,t−1 , across all NACE 2 industries for the four in-degree bins (1-5, 6-15, 16-35, >35) as histograms.The median and mean IOCs over time, IOC t,t−1 , are 0.718 (0.642) , 0.743 (0.686), 0.765 (0.730) and 0.818 (0.759), respectively, indicated by the vertical solid (dashed) lines, and increasing with in-degree.The standard deviations for the in-degree bins are 0.34, 0.217, 0.167 and 0.188, respectively, and decreasing with the number of in-links.e-h) illustrate the distributions of, OOC t,t−1 , for the respective out-degree bins.The median and (mean) OOCs over time, OOC t,t−1 , are 0.857 (0.719) , 0.759 (0.727), 0.801 (0.740) and 0.768 (0.768).Only the means are increasing, but not the medians.The standard deviations for the out-degree bins are 0.321, 0.206, 0.176 and 0.115; again decreasing with the out-link number.The similarity of firms input-and output-vectors over time is substantially higher than the intra-industry similarities.

Figure S13 :
Figure S13: Similarity distributions of input-and output-vectors of firms between 2019 and 2018 for each NACE2 industry.Similarity is measured with the overlap coefficient for firms with more than 35 suppliers (a) and buyers (b), respectively.The y-axis denotes the overlap coefficients between the two years, the x-axis shows the NACE2 code for the respective boxplots.The dark blue horizontal bars correspond to the median, (p 50% ), dark blue vertical lines to the interquartile range (p 25% p 75% ), and thin light blue vertical lines to error bars (p 5%p 95% ).Thin black vertical lines separate NACE1 classes.Empty columns indicate no firms in this degree bin.a) distributions of firms input overlap coefficients, IOC t,t−1 .The average of the mean (median) input overlaps, across NACE2 industries is 0.784 (807) and the standard deviation of mean (median) input overlaps is 0.099 (0.102).The average standard deviation is 0.124.This indicates that high input overlaps are the norm with few outliers.b) distributions of pairwise intra-industry output overlap coefficients, OOC t,t−1 .The average of the mean (median) output overlaps, across NACE2 industries is 0.813 (0.828) and the standard deviation of mean (median) output overlaps is 0.101 (0.103), indicating that relatively low output overlaps are the norm with few outliers.The average standard deviation is 0.108.Output overlaps are on average slightly higher than input overlaps.

Figure S14 :
Figure S14: Distributions of input vector overlaps, IOC t,t−1 , of firms across NACE2 industries for the years 2019 and 2018.NACE2 classes are on the x-axis;overlap coefficients on the y-axis.a) distributions of input overlap coefficients, IOC t,t−1 , for firms with in-degree between one and five, 1 ≤ k in i ≤ 5.The mean over the industries' mean (median) IOC t,t−1 is 0.660 (0.763), the standard deviation of mean (median) IOCs is 0.079 (0.104).The mean standard deviation is 0.334.b) distributions of input overlap coefficients, IOC t,t−1 , for firms with in-degree between 6 and fifteen, 6 ≤ k in i ≤ 15.The mean over the industries' mean (median) IOC t,t−1 is 0.692 (0.729), the standard deviation of mean (median) IOCs is 0.097 The mean standard deviation is 0.184.c) distributions of input overlap coefficients, IOC t,t−1 , for firms with in-degree between 16 and 35, 16 ≤ k in i ≤ 35.The mean over the industries' mean (median) IOC t,t−1 is 0.730 (0.094), the standard deviation of mean (median) IOCs is 0.094 (0.096).The mean standard deviation is 0.162.

Figure S15 :
Figure S15: Distributions of output vector overlaps, OOC t,t−1 , of firms across NACE 2 industries for the years 2019 and 2018.NACE2 classes are on the x-axis; overlap coefficients on the y-axis.a) distributions of output overlap coefficients, OOC t,t−1 , for firms with out-degree between one and five, 1 ≤ k out

Figure S16 :
FigureS16: Economy-wide production losses, L, obtained from an empirically calibrated and 1,000 synthetic COVID-19 shocks propagating linearly on the aggregated industry-level production network, IPN, (blue dashed line) and on the firm-level production network, FPN, (red line, histogram).The FPN and IPN correspond to the production network of Hungary in 2019; the firm-level shock, ψ, correspond to firms reducing their production level proportional to their reduction in employees between January and May 2020, and are taken from monthly firm-level labor data.The NACE2 level shock, φ, is the aggregation of ψ.The 1,000 synthetic shocks, Ψ, are sampled such that (when they are aggregated to the NACE2 level) they all have the same size as φ.The empirically calibrated shock, ψ, yields a FPN-based loss, L firm (ψ), of 9.6% (red line).The synthetic shocks yield a distribution of FPN-based production losses, L firm (Ψ), ranging from 9.5% to 10.7% of national output (histogram).The median is 10% (see boxplot).As a reference, the Hungarian GDP declined by 14.2% in Q2 2020.Note that for the IPN all realizations, Ψ, result in the same production loss, L ind. (φ), of 5.5%, by construction.The aggregation to the IPN causes a substantial underestimation of the FPN-based production losses.

Figure S17 :
Figure S17: Comparison of industry-specific production losses, L k , obtained from an empirically calibrated and 1,000 synthetic COVID-19 shocks propagating linearly on the aggregated industry-level production network, IPN, (blue '+'es) and on the firm-level production network, FPN, (red 'x'es, boxplots).For most industries the FPN-based production losses, L k firm (Ψ), (boxplots) vary substantially for few strongly across the synthetic shocks even though shocks have the same size on industry-level.Shock propagation on the industry-level (blue '+'es) can not capture this variation.IPN-based production-losses typically under-estimate the FPN based production losses significantly, on average by about 31.1%.
Dark blue horizontal bars indicate the median, (p 50% ), thick dark blue vertical lines indicate the inter-quartile range (p 25%p 75% ), thin light blue vertical lines indicate error bars (p 5%p 95% ), and thin vertical black lines separate NACE1 class affiliations.Empty columns indicate that 2: Set ζ l i ← 0 for all i ∈ I kInitialize the algorithm by setting all shocks to zero.Draw a shock η ∼ {ζ j | j ∈ I k } Draw a shock ζ j from the empirical distribution of shocks of sector k.Update the shock of firm i with the additional drawn shock.
9:if Ĩk = ∅ thenIf each firm has received a shock and the aggregate shock is still too small.10: Set Ĩk ← {i |p i = k} Fill up the index set again and continue to draw shocks.11: end if 12: end while 13: return ζ l i for i ∈ I k .Return the shock vector and use it as input for Algorithm 2.
k Calculate the distance from the targeted shock.4: set d i ← 0 ∀ i ∈ I k All shocks, ζ l i ∀i ∈ I k , are available for rescaling.
5: while o in > and o out > do Iterate until the distance to target up-and downstream shock size is small.6: